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Abstract 
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1. Introduction 


In this paper, we investigate the expressive power of several basic fragments 
of Tarski’s relation algebra [5] on finite tree-structured graphs. Tarski’s algebra 
is a fundamental tool in the field of algebraic logic which finds various appli¬ 
cations in computer science [3H6]. Our investigation is specihcally motivated 
by the role the relation algebra plays in the study of database query languages 
[THIS]- In particular, the algebras we consider in this paper correspond to nat¬ 
ural fragments of XPath. XPath is a simple language for navigation in XML 
documents (i.e., a standard syntax for representing node-labeled trees), which 
is at the heart of standard XML transformation languages and other XML tech¬ 
nologies [Hj. Keeping in the spirit of XML, we will continue to speak in what 
follows of trees as “documents” and the algebras we study as “XPath” algebras. 

XPath can be viewed as a query language in which an expression associates 
to every document a binary relation on its nodes representing all navigation 
paths in the document defined by that expression laiiaiis]. From this query- 
level perspective, several natural semantic issues have been investigated in re¬ 
cent years for various fragments of XPath. These include expressibility, closure 
properties, and complexity of evaluation [SI H Ha [13 EH], as well as decision 
problems such as satisfiability, containment, and equivalence EMU- 

Alternatively, we can view XPath as a navigational tool on a particular given 
document, and study expressiveness issues from this document-level perspective. 
(A similar duality exists in the relational database model, where Bancilhon [SI] 
and Paredaens [23] considered and characterized expressiveness at the instance 
level, which, subsequently, Chandra and Harel |24] contrasted with expressive¬ 
ness at the query level.) 

In this setting, our goal is to characterize, for various natural fragments of 
XPath, when a binary relation on the nodes of a given document (i.e., a set of 
navigation paths) is definable by an expression in the fragment. 

To achieve this goal, we develop a robust two-step methodology. The first 
step consists of characterizing when two nodes in a document cannot be distin¬ 
guished by an expression in the fragment under consideration. It turns out for 
those fragments we consider that this notion of expression equivalence of nodes 
is equivalent to an appropriate generalization of the classic notion of bisimilar¬ 
ity [53] ■ The second step of our methodology then consists of bootstrapping 
this result to a characterization for when a binary relation on the nodes of a 
given document is definable by an expression in the fragment (in the sense of 
the previous paragraph). 

We refer to this perspective on the semantics of XPath at the document level 
as the “global view.” In contrast with this global view, there is also a “local 
view” which we consider. In this view, one is only interested in the nodes to 
which one can navigate starting from a particular given node in the document 
under consideration. From this perspective, a set of nodes of that document can 
be seen as the end points of a set of paths starting at the given node. For each 
of the XPath fragments considered, we characterize when such a set represents 
the set of all paths starting at the given node defined by some expression in the 
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fragment. These characterizations are derived from the corresponding charac¬ 
terizations in the “global view,” and turn out to be particularly elegant in the 
important special case where the starting node is the root. 

In this paper, we study several natural XPath fragments. The most expres¬ 
sive among them is the XPath algebra which permits the self, parent, and child 
operators, predicates, compositions, and the boolean operators union, intersec¬ 
tion, and difference. (Since we work at the document level, i.e., the document 
is given, there is no need to include the ancestor and descendant operators 
as primitives.) We also consider the core XPath algebra, which is the XPath 
algebra without intersection and difference at the expression level. The core 
XPath algebra is the adaptation to our setting of Core XPath of Gottlob et 
al. [Hinilll]. Of both of these algebras, we also consider various “downward” 
and “upward” fragments without the parent and child operator, respectively. 
We also study “positive” variants of all the fragments considered, without the 
difference operator. 

Our strategy is to introduce and characterize generalizations of each of these 
practical fragments, towards a broader perspective on relation algebras on trees. 
These generalizations are based on a simple notion of path counting, a feature 
which also appears in XPath. 

The robustness of the characterizations provided in this paper is further 
strengthened by their feasibility. As discussed in Section the global and local 
definability problems for each of the XPath fragments are decidable in poly¬ 
nomial time. This feasibility hints towards efficient partitioning and reduction 
techniques on both the set of nodes and the set of paths in a document. Such 
techniques may fruitfully applied towards, e.g., document compression [27], ac¬ 
cess control |28j . and designing indexes for query processing miiisiisoi. 

We proceed in the paper as follows. In Section]^ we formally define doc¬ 
uments and the algebras, and then in Section we define a notion of “signa¬ 
tures” which will be essential in the sequel. In Section]^ we define the semantic 
and syntactic notions of node distinguishability necessary to obtain our desired 
structural characterizations. In the balance of the paper, we apply our two-step 
methodology to link semantic expression equivalence in the languages to appro¬ 
priate structural syntactic equivalence notions. In particular, we give structural 
characterizations, under both the global and local views, 

• of “strictly” (Section and “weakly” (Section downward languages, 
and their positive variants; 

• of upward languages and their positive variants (Section]^; and, 

• of languages with both downward and upward navigation, and their posi¬ 
tive variants (Section]^. 

Along the way, we also establish the equivalence of some of these fragments, 
using the structural characterizations obtained. We conclude in Section with 
a discussion of some ramifications of our results and directions for further study. 
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Figure 1: Example document. 


2. Documents and navigation 

In this paper, we are interested in navigating over documents in the form 
of unordered labeled trees. Formally, we denote such a document as D — 
{V, Ed,r, X), with D the document name, V the set of nodes of the tree, Ed 
the set of edges of the tree, r the root of the tree, and X : V ^ C a function 
assigning to each node a label from some infinite set of labels C. 

Example 2.1. Figure shows an example of a document that will be used 
throughout the paper. Here, r = rii is the root of the tree with label A(r'i) = a. 

We next define a set of operations on documents, as tabulated in Table [l] 
The left column shows the syntax of the operation, and the right column its 
semantics, given a document D = {V, Ed,r, X). Notice that, in each case, the 
result is a binary relation on the nodes of the document. 

The basic algebra, denoted X, is the language consisting of all expressions 
built from 0, £, £ with £ G £, composition (“/”)) union (“U”). The basic 
algebra X can be extended by adding some of the other operations in Table [T] 
which we call nonbasic. If if is a set of nonbasic operations, then X{E) denotes 
the algebra obtained by adding the operations in E to the basic algebra X. 
When writing expressions, we assume that unary operations take precedence 
over binary operations, and that composition takes precedence over the set 
operations. 

Notice that we do not consider transitive closure operations such as the 
descendant (“I*”) or ancestor operations of XPath. The reason for this is 

that, in this paper, we only consider navigation within a single, given document. 

Example 2.2. Consider the document D in Figure Let e be the expres¬ 
sion t/7ri(i/5/i/c) — ch> 2 (£)/t in the language f, tti, ch> 2 , —) (or, for that 
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Table 1: Binary operations on documents. The left column shows the syntax of the operation, 
and the right column its semantics, given a document D = {V, Ed,r, \). Below, £ is a label 
in C and k > 1 a, natural number. Furthermore, in the recursive definitions, e, ei, and 62 
represents expressions built with the operations. 


Syntax 

Semantics 

0 

0(T') = 0 

e 

e{D) = {(w,w) \veV} 

i 

i{D) = {{v,v)\v £V &L X{v) = £} 

; 

\.{D) = Ed 

t 

t(L») = Ed-^ 

7ri(e) 

7ri(e)(D) = {(a, a) (3w)(n,w) e e{D)} 

772 (e) 

7r2(e)(I3) = {(w,w) { 3 v){v,w) G e{D)} 


e-\D) = e{D)-^ 

ch>fe(e) 

ch>k{e){D) = {(w, n) 1 w e F & |{w 1 (a, w) e Frf & {w, w) G 7ri(e)(D)| > k} 

ei/e2 

61/62(15) = {(M,ia) (3a)((M,a) G 61 (D) & (a,w) G 62(D))} 

ei U 62 

61 U 62(D) = 61(D) U 62(0) 

ei n 62 

61 n 62(D) = 61(D) n 62(0) 

ei - 62 

61 - 62(D) = 61(D) - 62(D) 


matter, in any language X{E) with {J,,f, tti, ch> 2 , —} C E). Then, e{D) = 
{{V 2 ,vi), iv8,V4), (wio, V 4 )}. 

Not all the above operations are primitive, however. For instance, intersec¬ 
tion (“n”) is expressible as soon a set difference is expressible, since, for 

any two sets A and B, A D B = A — {A — B). Even more eliminations are 
possible in the following setting. 

Proposition 2.3. Let E be a set of nonbasic operations containing set differenee 
or intersection (“A”) for which and are both contained in E or 
both not contained in E. Then, for each expression e in X(E), there is an 
equivalent expression in X{E — {7ri,7r2, “^}). 

Proof. First, we eliminate both projections using the identities 

7ri(e) = (e/e“^)ne; 

7’'2(e) = (e“Ve)ne. 

Hence, each expression in X{E) can be replaced by an equivalent expression 
in X{{E U {~^}) — {tti, 712 }). It remains to show that we can eliminate inverse 
(““1”). This follows from the following identities. In these, D = {V, Ed,r, X) is 
a document, £ € C is & label. A: > I is a natural number, and e, ei and 62 are 
expressions in X{E). 

• 0-i(D) =0(D); 

• = e(T>); 
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. t^{D) = l{D)- 

. r'p)=tp); 

. r\D) = i{D)- 
. (e-i)-i(Z?) = ep); 

. {e,le^)-\D) = e^^le^\D)- 

• ch>fc(e)“i(£>) = ch>fc(e)(L>); 

• (ei U e2)“H-D) = e;f^ U 

• (ei n e2)“H-D) = e;f^ n 

• (ei - e2)"^(i:>) = 


□ 

Notice that in a language with both upward (“t”) and downward (“i”) 
navigation, the identities 7ri(e)(Z?) = 7 r2(e~^)(D) and 7r2(e)(-D) = 7 ri(e~^)(D) 
imply that one projection operation can be eliminated in favor of the other. 
Hence, it does not make sense to consider the projection operations separately. 

Some counting operations (“ch>fc(e)”) can also be simulated. One can easily 
verify the following. 

Proposition 2.4. Let D = {V, Ed,r, X) be a doeument. Then, 

1. ch>i{e){D) = TTi{i/e){D); 

2. ch> 2 {e){D) = 7ri(|/(7ri(e)/tA/7ri(e) - e)){D); and 

3. ch> 3 {e){D) = 7ri4/((7ri(e)/tA/7ri(e) - £)/(7ri(e)/tA/7ri(e) - e) - e){D) 


Example 2.5. Consider again the expression e := t/TTid/^/i/c) —ch> 2 (e)/t of 
Example |2.2[ Using Proposition |2.4[ and making some straightforward simpli¬ 
fications, we can rewrite e as t/7ri(j./&/j./c) — 7ri(4,/(t/4. — e))/ t; an expression 
of A’(4,, t, TTi, —). Alternatively, one can use Proposition 2.3 and the techniques 
exhibited in its proof to rewrite e as 


t/ii/b/i/c/t/i) - i/((tA - £)/(tA - ^) n £)/t, 


an expression in A’(j,, f, n, —). Finally, we invite the reader to verify that e can 
also be rewritten as 


7ri(e - 7ri(j./(t/j. - e)))/t/7ri4/6/i/c), 
also an expression of Xtti,—). 
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is equivalent to X (|,f, —), the XPath algebra^ This is justihed by the following 
result. 

Proposition 2.6. Given a single document D = (V, Ed, r, A), the XPath algebra 
is equivalent to XPath. 

Proof. Notice that i in XPath m is simulated by f/£ in the XPath algebra. 
Furthermore, I in the XPath algebra is simulated by e[label = £] in XPath. 
The proof is complete if, for each predicate P in XPath, there exists an XPath 
algebra expression e such that e{D) = {(n,n) | n S P{D)}. This is proved by 
structural induction; 

1. if P is an XPath expression without predicates, then take e := 7ri(/), with 
/ the XPath algebra expression obtained from P by replacing everywhere 
£ by i/l 

2. if P is label = £, then take e := £. 

3. if P is ^Q, with Q an XPath predicate, then take e := e — f, with / the 
XPath algebra expression corresponding to Q. 

4. if P is Q 1 AQ 2 , with Qi and Q 2 XPath predicates, then take e := /i n / 2 , 
with /i and /2 the XPath algebra expressions corresponding to Qi and 
Q 2 , respectively. 

5. if P is Qi V Q 2 , with Qi and Q 2 XPath predicates, then take e := /i U / 2 , 
with /i and /2 the XPath algebra expressions corresponding to Qi and 
Q 2 , respectively. 

□ 


We shall call the language ffd, f, tti, 7r2,. n, —), which by Proposition 
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Besides the standard languages X{E), with E a set of nonbasic operations, 
we also consider the so-called core languages C{E). More concretely, C{E) is 
defined recursively in the same way as X (P— {n, —}), except that in expressions 
of the form 7ri(/), and 7r2(/), / may be a boolean combination of expressions 
of the language using union and the operations in P H {D, —}, rather than just 
an expression of the language. 

The above terminology is inspired by the fact that C(4,,f, tti, 7r2, —, n), the 
language which we call the core XPath algebra, is the adaptation to our setting 
of Core XPath of Gottlob and Koch [16]. 


Example 2.7. Continuing with Example |2j J we consider again the expression 


e := f/TTi{f/h/f/c) — ch> 2 (e)/t of Example 2.2 Obviously, there is no core 


language of which e is an expression, as set difference occurs at the outer 

level, and not in a subexpression / which in turn is embedded in a subexpression 
of the form 7ri(/) or TT 2 {f). However, in Example |2.5[ the expression e has been 
shown to be equivalent to 


7ri(e - 7ri(i/(t/i - e)))/t/7ri(i/6/i/c). 


^ Note that the XPath algebra corresponds to the (full) relation algebra of Tarski [2|, 
adapted to our setting (cf. HI)- 
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Table 2: 

Languages studied in this paper. 

Language 

Relation algebra fragment 

strictly downward (core) 
XPath algebra with 
counting up to k 

A’(4.,7ri,ch>i(.),... ,ch>fe(.), -) 

= C(4.,7ri,ch>i(.),... ,ch>j,(.), -) 

strictly downward (core) 
positive XPath algebra 

A'(4,,7ri,n) =C(4., TTi.n) 

weakly downward (core) 
XPath algebra with 
counting up to k 

^■(4-, 7ri,7r2,ch>i(.), . . . ,ch>s,(.), -) 

= C(4,7ri,7r2,ch>i (.),..., ch>j, (.),-) 

weakly downward (core) 
positive XPath algebra 

A’(4,7ri,7r2) = A’(4,7ri,7r2,n) = 0(4, tti, 7r2, n) 

strictly upward (core) 

XPath algebra 

A’(t,^l,-) =C(t,7ri,-) 

strictly upward (core) 
positive XPath algebra 

^■(4,711,0) =C(t,7ri,n) 

weakly upward languages 

see Section 7.2 

XPath algebra 

A'(4,t,7ri,7r2,.-Ln,-) =by(4,t,-) 

XPath algebra with 
counting up to k 

A^(4,t,ch>i(.),...,ch>fc(.),-) 

core XPath algebra 

C(4,t,77l,7r2,-,n) 

core XPath algebra with 
counting up to k 

C(4,t,7ri,7r2,ch>i(.), . . . ,ch>j,(.), -) 

(core) positive XPath 
algebra ( |31p 

n) = A'(4,t, TTl, 772) = ^(4,4,771,772, n) 


which is an expression of C(4,,fj tJ"!, —,H), and hence also of the core XPath 
algebra. 

Given a set of nonbasic operators E, an expression in X{E) can in general 
not be converted to an equivalent expression in C{E), however, as will follow 
from the results of this paper, even though there are exceptions (Section 
Theorem |5.19[ ) . 

Table gives an overview of the various relation algebra fragments we in¬ 
vestigate below. 

To conclude this section, we observe that, given a document and an expres¬ 
sion, we have defined the semantics of that expression as a binary relation over 
the nodes of the document, i.e., as a set of pair of nodes. From the perspective 
of navigation, however, it is useful to be able to say that an expression allows 
one to navigate from one node of the document to another. For this purpose, 
we introduce the following notation. 

Definition 2.8. Let e be an arbitrary expression, and let D = {V, Ed,r, X) be 
a document. For v GV, e{D){v) := {w | {v,w) G e(E)}. 


















Definition 2.8 reflects the “local” perspective of an expression working on 
particular nodes of a document, rather than the “global” perspective of working 
on an entire document. 


Example 2.9. Consider again the expression e := t/’^'id/Vi/c) — ch> 2 (e)/t 
of Example |2.2[ We have established that, for the document D in Figure 
e{D) = {{v 2 , vi), {vs, V 4 ), (uio, U 4 )}. Hence, e{D){vs) = {^ 4 } and e{D){vi) = 0. 


3. Signatures 

Given a pair of nodes in a document, there is a unique path in that document 
(not taking into account the direction of the edges) to navigate from the first to 
the second node, in general by going a few steps upward in the tree, and then 
going a few steps downward. We call this the signature of that pair of nodes, 
and shall formally represent it by an expression in X (),, f). 

Definition 3.1. Let D = {V, Ed,r, X) be a document, and let v,w G V. The 
signature of the pair (v,w), denoted sig(v,w), is the expression in f) that 
is recursively defined, as follows: 

• ii V = w, then sig(u, w) := e; 

• if u is an ancestor of w, and 2 ; is the child of v on the path from v to ic, 
then sig{v,w) := |/sig(z,rc); 

• otherwis^ if z is the parent of u, then sig{v,w) := t/sig(z,u;). 

Given nodes v and w of a document D = (E, Ed, r. A), we denote by top(u, w) 
the unique node on the undirected path from v to w that is an ancestor of both 
V and w. Clearly, 

sig(u, w) = sig(u, top(u, ii;))/sig(top(u, w), w) = t"*/!", 

where m, respectively n, is the distance from top(u, w) to v, respectively w; and, 
for an expression e and a natural number i > 1, e* denotes the i-fold composition 
of ej^ (We put e° := e.) 

The signature of a pair of nodes of a document can be seen as a description 
of the unique path connecting these nodes, but also as an expression that can 
be applied to the document under consideration. We shall often exploit this 
duality. 


^In particular, v r. 

®Here, and elsewhere in this paper, equality between expressions must be interpreted at 
the semantic and not at the syntactic level, i.e., for two expressions ei and 62 in one of the 
languages considered here, ei = 62 means that, for each document D, e\(D) = e 2 {D). 
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Example 3.2. For the document D in Figurej^ sig(wi,ui) = £, sig(ui,'U2) = i, 
sig(u6,U4) = t^/i) and sig(uii,us) = We have that 

sig(uii, U5)(L») = {(uii, U5), (ui2, V5), (ui3, U5), (uii, ue), {vi2, ve), (ui3, ve), 

(uii, U7), (ui2, V7), {vi3,Vt), {vii,Vs), (wi 2, ^'8), ('^13, Vs), 
ivii,Vg), {vi 2 ,Vg), (^13,^9), (un, Uiq), (ui2, Uiq), (^13, Uiq)}. 


Notice that not each pair in the result has the same signature as (^11,^5). For 
instance, sig(uii,U8) = and sig(uii,U9) = f- 


Now, let (ui,wi) and (v2,W2) be two pairs of nodes in a document D = 
{y, Ed, r, A). We say that (vi,wi) subsumes {v2, W2), denoted (vi,wi) > (v2,W2), 
if (v2,W2) is in sig(ui, wi)(iA). We say that (vi,wi) are (v2,iV2) congruent, de¬ 
noted (wi,wi) = {v2,W2), if (ui,wi) > {v2,W2) and {v2,W2) ^ {i'i,wi)- It can 
be easily seen that, in this case, sig(ui,r(;i) = sig(u2,■iC2)- Informally speaking, 
the path from vi to Wi has then the same shape as the path from V2 to W2- 


Example 3.3. Consider again Example 3.2 


Clearly, (uii,U5) subsumes each 
pair of nodes in sig(uii,'U5)(iA), e.g., (uii,U5) > {vi2,ve) and (uii,U5) > (^12,^9). 
Notice that also {vi2,ve) > (uii,U5), and hence (uii,U5) = {vi2,ve). However, 
(^^i2,'C9) ^ Hence, these pairs are not congruent. 


By definition, subsumption is captured by the “sig” expression. One may 
wonder if there also exists an expression that precisely captures congruence. 
This is the case in the following situations. 

Proposition 3.4. Let D = {V,Ed,r,X) be a document and let vi,V 2 ,wi,W 2 G 
V. Then, 

1. if vi is an ancestor of wi or vice versa, (vi,wi) = (v2,W2) if and only if 
{v 2 ,W 2 ) e sig{vi,wi){D); 

2 . otherwise, let sig{vi,wi) = Then, as m > 1 and n> 1, {vi,wi) = 

(V 2 ,W 2 ) if and only if (^2,^2) G t""/!" - 

Proof. 1 . As the “only if” is trivial, it suffices to consider the “if,” which 
follows from a straightforward induction argument. 

2 . As the “only if” is straightforward, we only consider the “if.” Let t2 := 
y{D){v2). Since W2 G y{D){t2), t2 is a common ancestor. Let V2 
and W2 be the children of t2 on the path to V2 and W2, respectively. If 
V2 = W2, then {v2,W2) G /y~^{D), a contradiction. Hence, V2 w'2 

and t2 = top(u2,W2), and sig(u2,W2) = = sig(ui,ici). 

□ 


For later use, but also because of their independent interest, we finally note 
the following fundamental properties of subsumption and congruence. 

Proposition 3.5. Let v, w, vi, wi, zi, V 2 , W 2 , and Z 2 be nodes of a docu¬ 
ment D. Then the following properties hold. 
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Figure 2: Document of Example 1 3.6 1 


1. {v, v) > {w, w). 

2. (iiijUii) > iv2,W2) implies that (wi,Vi) > {w2,V2). 

3. If top{vi,zi) is also an ancestor of wi, then (vi,wi) > (v 2 ,W 2 ) and 

> {W 2 ,Z 2 ) imply that {vi,Zi) > {v 2 ,Z 2 ). 

4. All properties above also hold when subsumption is replaced by congruence, 
provided that, in item^ the condition “top{v 2 , Z 2 ) is also an ancestor of 
W 2 ” is added. 

Proof. All properties are straightforward, except for Property So, assume 
that (ui,wi) > {v 2 ,W 2 ) and (ui,zi) > {v 2 ,Z 2 ). Hence, (^ 2 ,^ 2 ) € sig(t;i, 
and (u 2 , Z 2 ) G sig(ui, zi){D), as a consequence of which 

{v 2 ,Z 2 ) G sig(ui,'u;i)/sig('u;i,zi)(i:)). 

For the sake of abbreviation, let ti := top(ui,i(;i) and ui := top(wi, zi). Using 
these nodes, we can write 

sig(ui ,wi)/ sig(w;i, zi) = sig(ui, ti)/sig(t 1, )/sig(wi, ui)/sig{ui,zi), 

which is equal to sig(ui, si)/sig(si, zi), where si is the higher of ti and ui 
in D. Notice that si is a common ancestor of vi and zi, as a consequence 
of which it is also an ancestor of top(ui,Zi), the least common ancestor of vi 
and Zi- By assumption, top(ui,Zi) is a common ancestor of Vi, Wi, and Zi, 
and hence also of top(ui, wi) and top(wi, zi), the highest of which is Si. Thus, 
Si = top(ui, zi), and, therefore, sig(ui, si)/sig(si, zi) = sig(ui, zi). In summary, 
(v2,Z2) G sig(vi,zi)(L>), and hence (ui,zi) > (u2,Z2). □ 

Observe that the condition in Proposition ([^, is necessary for that part 
of the proposition to hold, as shown by the following counterexample. 
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Example 3.6. Consider the document in Figure Labels have been omitted, 
because they are not relevant in this discussion. (We assume all nodes have the 
same label.) Observe that (vi,wi) = {v2,W2) and {wi,zi) = {w2,Z2). However, 
top(ui, Zi) is not an ancestor of Wi, hence, Proposition | 3 . 5 [ (|^, is not applicable. 
We see that, indeed, {vi,zi) does not subsume (1^2,22), let alone that (^2,22) 
and {v2,W2) would be congruent. 

4. Distinguishability of nodes in a document 

We wish to link the distinguishing power of a navigational language on a 
document to syntactic conditions which can readily be verified on that docu¬ 
ment. As argued before, the action of an expression on a document can be 
interpreted as (1) returning pairs of nodes, or (2) given a node, returning the 
set of nodes that can be reached from that node. We shall refer to the first 
interpretation as the pairs semantics, and to the second interpretation as the 
node semantics. In this section, we propose suitable semantic and syntactic 
notions of distinguishability for the node semantics. 

4.1. Distinguishability of nodes at the semantic level 

We propose the following distinguishability criterion based on the emptiness 
or nonemptiness of the set of nodes that can be reached by applying an arbitrary 
expression of the language under consideration. 

Definition 4.1. Let L be one of the languages considered in Section Let 
D = {V, Ed, r, A) be a document, and let vi,V2 G V. Then, 

1. vi and V2 are expression-related, denoted vi >exp V2, if, for each expression 
e in L, e{D){vi) 7^ 0 implies e{D){v2) 7^ 0 ; and 

2. vi and V2 are expression-equivalent, denoted Vi =exp V2, if Vi >exp ^'2 and 

V2 >exp Vi- 

In principle, we should have reflected the language under consideration in 
the notation for expression-equivalence. As the language under consideration 
will always be clear from the context, we chose not to do so in order to avoid 
overloaded notation. 

The following observation is useful. 

Proposition 4.2. Let E be a set of nonbasic operations containing first pro¬ 
jection (“tti”) and set difference Consider expression-equivalence with 

respect to X{E). Let D = {V, Ed, r. A) be a document, and let vi,V2 G V. Then, 
vi =exp ^'2 if and only if Vi >exp V2. 

Proof. Assume that V2 ^exp Then there exists an expression / in X{E) 
such that f{D){v2) 7^ 0 and f{D){vi) = 0 . Now consider e := 7ri(e — 7ri(/)). 
Clearly, e{D){v2) = 0 and e{D){vi) 7^ 0 , hence vi ^exp 1^2- By contraposition, 
■Cl >exp V2 implies V2 >exp Cl, and hence also vi =exp C2. □ 
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J^.H. Distinguishability of nodes at the syntactic level 

Our syntactic criterion of distinguishability is based on the similarity of the 
documents locally around the nodes under consideration. In order to decide this 
similarity, we shall consider a hierarchy for the degree of coarseness by which 
we compare the environments of those nodes. We shall also consider variants 
for the cases where from the given nodes of the document we (1) only look 
downward; (2) only look upward; or (3) look in both directions. 

4-. 2.1. Downward distinguishability 

For the downward case, we consider the following syntactic notions of dis¬ 
tinguishability of nodes. They are all defined recursively on the height of the 
first node. 

Definition 4.3. Let D = {V, Ed,r, X) be a document, let Vi,V 2 G V, and let 
fc > 1. Then, vi and V 2 are downward-k-equivalent, denoted vi V 2 , if 

1. A(z;i) = Xiv2); 

2. for each child wi of vi, there exists a child W 2 of V 2 such that tci W 2 , 
and vice versa; 

3. for each child wi of vi and W 2 of V 2 such that wi W 2 , mindtci |,/c) = 
min(|?h 2 |, k), where, for j = 1, 2, Wj, is the set of all siblings of Wi (including 
Wi itself) that are downward fc-equivalent to 

For fc = 1, the third condition in the above definition is trivially satisfied. 
In the literature, downward I-equivalence is usually referred to as bisimilarity 

m- 

Example 4.4. Consider again the example document in Figure [T] Notice that 
V 2 'Cio for any value of fc > 1. We also have that V 2 =]; v^, and, for any value 
of fc > 2, V 2 V 3 . Finally, notice that V 3 V 4 for any value of fc > 1. 

The following is immediate from the second condition in the Definition |4.3[ 

Proposition 4.5. Let D = (V, Ed,r, X) be a document, let vi,V 2 G V, and let 



The following property of downward-fc-equivalence will turn out to be very 
useful in the sequel. 

Proposition 4.6. Let D = {V, Ed,r, X) be a document, and let k>l. Let ‘=” 
be an equivalence relation on V such that, for all vi,V 2 G V with vi = V 2 , 


1. A(z;i) = A(f2); 


2. for each child wi of vi, there exists a child W 2 of V 2 such that wi = W 2 , 
and vice versa; and 


^For a set A, |A| denotes the cardinality of A. 

®By the height of a node, we mean the length of the longest path from that node to a leaf. 
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3 . for each child wi of vi and each child W2 of such that wi = W2, 
mindwi I,/c) = min(|w2|, fc), where, fori = 1 , 2 , Wi is the set of all siblings 
of Vi (including Vi itself) that are equivalent to vt under ‘=.” 

Then, for all vi,V2 & V, vi = V2 implies vi V2- 


Proof. By induction of the height of Vi. 

If vi is a leaf, the second condition above implies that V2 must also be a leaf. 
By the first condition, A{z;i) = X{v2). Hence, vi V2. 

If vx is not a leaf, we still have, by the first condition, that A(f i) = A(r;2)- 
Hence the first condition in the definition of vi V2 (Definition 4 . 31 is satisfied. 

The second condition in the definition of vi =J; V2 follows from the second 
condition above and the induction hypothesis. 

It remains to show that also the third condition in the definition of vx V2 


holds. Thereto, let wx be a child of vx and W2 be a child of V2 such that wx W2- 
We show that min(|w;i|,fc) = min(|t(;2|, A:), where, for i = 1 , 2 , wt is the set of 
all siblings of Wi (including Wi itself) that are downward /c-equivalent to Wi. 
Let {Wxx, •.., Wxi} be the coarsest partition of Wx in =-equivalent nodes, and 
let {IT21,..., W2e} be the coarsest partition of W2 in =-equivalent nodes. By 
the induction hypothesis and the second condition above, both partitions have 
indeed the same size. It follows furthermore that no node of tci is =-equivalent 
with a child of vx outside wx, and that no node of W2 is =-equivalent with a 
child of V2 outside W2. Without loss of generality, we may assume that, for 
i = 1 ,..., every node in Wxi is =-equivalent to every node in W2i. Hence, by 
the third condition above, min(|ITii|,/c) = min(|IT2i|,/c). We now distinguish 
two cases. 


1 . For all i = \Wxi\ < k. Then, for alH = I,..., \Wxi\ = \W2i\. It 

follows that Itcil = |ui2|, and, hence, also that mindtcil, k) = min(|t(;2|, k). 

2 . For some i, 1 < i < £, \Wxi\ > k. Then, \W2i\ = \Wxi\ > k. Hence, 
Iwil > k and |ui2| > k. It follows that min(|-u;i|, fc) = min(|t(;2|. A:) = k. 

We conclude that, in both cases, the third condition in the definition of vx V2 
is also satisfied. □ 


So, given a document D = {V, Ed, r. A), downward-Ic-equivalence is the coars¬ 
est equivalence relation on V satisfying Proposition | 4 . 6 [ 

A straightforward application of Proposition | 4 . 6 | yields 

Corollary 4 . 7 . Let k > 1 . Let D = {V, Ed, r. A) be a document, and let vx,V2 G 
V. If Vx V2, then vx V2. 


Proof. It suffices to observe that is an equivalence relation satisfying 

Proposition | 4 . 6 | for the value of k in the statement of the Corollary, above. For 

this follows immediately from the 


the first two conditions in Proposition 4.6 


For the third condition in Proposi- 


corresponding conditions in Definition 4.3 
tion| 4.6 


^^ this also follows from the third condition in Definition | 4 . 3 | if one takes 
into account that, for arbitrary sets A and B, min(|A|, fc -|- 1 ) = min(|i?|, fc -I- 1 ) 
implies that min(|A|, k) = min(|i?|, k). □ 
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Jf.. 2 . 2 . Upward distinguishability 

If we only look upward in the document, there is only one reasonable defini¬ 
tion of node distinguishability, as each node has at most one parent. In contrast 
with the downward case, the recursion in the definition is on the depth of the 
first node. 

Definition 4.8. Let D = {V, Ed, r, A) be a document, and let vi,V2 & V. Then, 
vi and V2 are upward-equivalent, denoted vi V2, if 

1. X{vi) = A(u2); 

2 . vi is the root if and only if V2 is the root; 

3 . if vi and V2 are not the root, and ui and U2 are the parents of vi and V2, 
respectively, then ui U2- 

It is easily seen that two nodes are upward-equivalent if the paths from the 
root to these two nodes are isomorphic in the sense that they have the same 
length and corresponding nodes have the same label. 

Example 4.9. In the example document of Figure[^we have, e.g., that vq 
V7 , Vs =t V 9 , Vii Vi2, but Vs '^ 13 - 

4 - 2 . 3 . Two-way distinguishability 

If we look both upward and downward in a document, we can define a 
notion of equivalence by combining the definitions of upward- and fc-downward- 
equivalence: two nodes are /c-equivalent if they are upward-equivalent, and if 
corresponding nodes on the isomorphic paths from the root to these nodes are k- 
downward-equivalent. More formally, we have the following recursive definition, 
where the recursion is on the depth of the first node. 

Definition 4.10. Let D = {V, Ed,r, X) be a document, let Vi,V2 G V, and let 
fc > 1 . Then, vi and V2 are k-equivalent, denoted vi =| V2, if 

1 . vi V2-, 

2 . vi is the root if and only if V2 is the root; and 

3 . if Vi and V2 are not the root, and Ui and U2 are the parents of Vi and V2, 
respectively, then ui =| U2. 

Stated in a nonrecursive way, two nodes are fc-equivalent if the paths from 
the root to these two nodes have equal length and corresponding nodes on these 
two paths are downward-fc-equivalent. 

Example 4.11. Consider again the example document in Figure We have 
that, e.g. Vs =| vq uy, but no two of these nodes are fc-equivalent for any 
value of fc > 2 . Also, vs vs and vs vis, for any value of fc > 1 . 

By a straightforward inductive argument, the following is immediate from 
Corollary 113 

Proposition 4.12. Let k > \. Let D = {V, Ed,r,X) be a document, and let 
vi,V2 G V. If vi = 1 ^^ V2, then vi =| t>2. 
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Jj.. 3 . Distinguishability of pairs of nodes at the syntactic level 

We also define notions of distinguishability of pairs of nodes, by requiring 
that the pairs have subsumed or congruent signatures and that corresponding 
nodes on the (undirected) paths between begin and end points of both pairs are 
related under one of the notions defined in Subsection 14.21 


Definition 4.13. Let D = (V, Ed, r, A) be a document, let d be one of the syn¬ 


tactic relationships between nodes defined in Subsection | 4. 2 

{vi,wi 


and let vi,wi,V2, 
d-suhsumes {v2,W2), denoted (ui,wi) >. 


and W2 be nodes in V. Then, wiy u-suusuincs [V2, W2j, ueiiuueu 
(v2,W2) (respectively, (vi,wi) and (v2,W2) are d-congruent, denoted (ui, wi) 
(V2,W2)) if 


1. (ui.rci) > {v2,W2) (respectively, (ui,wi) = {v2,W2)); and 

2. for each node yi on the path form vi to wi, yidy2, where y2 is the unique 
ancestor of V2 or W2 or both for which (^2,1/2) G sig{vi,yi){D) (or, equiv¬ 
alently, (2/2, W2) e sig(?/i,'u;i)(iA))(^ 


Example 4.14. Consider again the example document in Figure We have 
that, e.g., (^2,^5) ==fe {v3,ve) for fc = 1 but not for any higher value of k; 
iv2,V5) ==k (?^10,^^13) for any value oi k > 1; (^2,^5) ==.^ iv4,Vg); (u5,U6) ==k 
(v3,V7) for any value of fc > 1; and (^6,^7) >_i (v2,V3), but not the other way 
around. 


The following observation is obvious from the definition. 

Proposition 4.15. Let D = {V, Ed,r, X) be a document, let ip S {>,=}, let 
d be one of the syntactic relationships between nodes defined in Subsection \ 4 -^ 
and let vi, wi, vg, and W2 be nodes of D such that (ui,xci) ipi) {v2,W2). Let yi 
and j/2 be nodes on the path from vi to wi, and let zi and Z2 be ancestors of V2 
orw2 or both corresponding to yi and y2, respectively. Then (j/i,zi) (j/2,-Z2)- 

The mutual position of the nodes in the statement of Proposition | 4 . 15 | is 
illustrated in Figure [ 3 [ 

From Proposition | 3 . 4 [ Q, the following is also obvious. 

Proposition 4.16. Let D = {V, Ed,r, X) be a document, let d be one of the 
syntactic relationships between nodes defined in Subsection \ 4 .S\ and let vi,V2,Wi, 
and W2 be nodes in V. If Vi is an ancestor of Wi or vice versa, {vi,Wi) =,9 
{v2,W2) if and only if{vi,wi) (^2,^2). 

Finally, from Definitions EH and | 4 . 13 | the following is immediate. 

Proposition 4.17. Let D = {V, Ed,r, X) be a document, let vi,V2 G V, and let 
k>l. Then, Vi =| V2 if and only if {r,vi) ==fe (r, V2). 

Table summarizes all of the distinguishability notions presented in this 
section. The balance of the paper is devoted to identifying the languages which 
correspond in expressive power to each of these notions. 


®In the sequel, we call y\ and 1/2 corresponding nodes. 
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Figure 3: Mutual position of the nodes mentioned in the statement of Proposition |4.15] 


Table 3: Distinguishability notions of Section]^ 


distinguishability notion 

notation 

defined in 

expression-related 

^exp 

Definition 

4.1 


expression-equivalent 

=exp 

Definition 

4.1 


downward- fc-equi valent 

— k 

Definition 

4.3 


upward-equivalent 

=t 

Definition 

4.8 


fc-equivalent 

— k 

Dehnition 

4.10 


i?-subsumes 

>l 

Definition 

4.13 


f?-congruent 

— 

Definition 

4.13 



5. Strictly downward languages 

We call a language downward if, for any expression e in that language, and 
for any node v of the document D under consideration, all nodes in e{D)(v) are 
descendants of v. 

In this section, we consider languages with the stronger property that, for 
any expression e in the language, and for any node v of the document D under 
consideration, e{D){v) = e{D'){v), where D' is the subtree of D rooted at v. 
We shall call such languages strictly downward. 

Downward languages that are not strictly downward will be called weakly 
downward and are the subject of Section]^ 

Considering the nonbasic operations in Tablej^ the language X{E) is strictly 
downward if and only if E does not contain upward navigation (“f”), second 
projection (“ 7 r 2 ”), and inverse It is the purpose of this section to 

investigate the expressive power of these languages at the document level, both 
for query expressiveness and navigational expressiveness, and, in some cases, 
derive actual characterizations for these. 
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5.1. Sufficient conditions for expression equivalence 


If e is an expression in a downward language X{E), then it follows im¬ 
mediately from the definition that, given a node v of the document D under 
consideration, each node in e{D){v) is a descendant of v. Therefore, we only 
need to consider ancestor-descendant pairs of nodes, for which corresponding 
notions of subsumption and congruence coincide (Proposition 4 . 16 ). 

The following property of =J-congruence, A: > 1 , for ancestor-descendant 
pairs of nodes will turn out to be very useful. 


Lemma 5 . 1 . Let D = (V., Ed,r, X) be a document, let vi, wi, and V2 be nodes 
of D such that wi is a descendant of vi, and let k> 1 . Ifvi V2, then V2 has 
a descendant W2 in D such that (wi,wi) ==fc (^2,^2)- 


Proof. The proof is by induction of the length of the path from vi to Wi. If 
wi = vi, then, obviously, Lemma [ 5 . 1 | is satisfied for W2 '.= V2. If wi f vi, then 


let yi be the child of vi on the path to wi. By Definition 4.3 V2 has a child 2/2 
such that yi ?/2- By the induction hypothesis, y2 has a descendant W2 in D 
such that (?/i,uii) ==k (2/2,'^2). From Definition 


that {vi,wi) ==k {v2,W2). 


4.13 


it is now straightforward 

□ 


We now link =J|-congruence of ancestor-descendant pairs of nodes with ex- 
pressibility in strictly downward languages. 


Proposition 5 . 2 . Let k > 1 , and let E be the set of all nonbasic operations 
in TaWe except for upward navigation (‘f”), second projection (“’^2”), in¬ 
verse and selection on at least m children satisfying some condition 

(“ch>rni-) ”) for m > k. Let e be an expression in X[E). Let D = (V, Ed, r, A) be 
a document, let vi, wi, V2, and W2 be nodes of D such that wi is a descendant of 
vi andw2 is a descendant ofv2. Assume furthermore that {vi,wi) ==k {v2,W2). 
Then, (wi,wi) G e{D) if and only if {v2,W2) G e(D). 


Proof. By symmetry, it suffices to show that {vi,Wi) G e{D) implies (^2,102) G 
e{D). We prove this by structural induction. For the atomic operators 0 , e, I 
{i G £), and I, it is straightforward that Proposition 5.2 holds. We have now 
settled the base case and turn to the induction step. 


1 . e := ei/e2, with ei and 62 satisfying Proposition | 5 . 2 [ Assume that 
(ui,rci) G e{D). Then there exists yi £ V such that {vi,yi) G ei{D) 
and {yi,wi) G 62(0). By the strictly downward nature of X{E), yi is 
on the path from vi to wi. Let j/2 be th e nod e on the path from V2 
to W2 corresponding to yi. By Proposition 


4.15 


{'Oi,yi) =1 {v2,y2) and 

{yi,wi) {y2,W2). By the induction hypothesis, (^2,2/2) G efD) and 
(2/2, W2) G e2{D). Hence, (u2,W2) G e{D). 

2 . e := TTi{f), with / satisfying Proposition 5.2 Assume that (ui,wi) G 
e{D). Then, necessarily vi = wi, and, consequently, V2 = W2. From 
(ui,wi) G TTi{f){D), it follows that there exists zi G V s uch that {vi,zi) G 


f{D). Since vi V2, it also follows, by Lemma 5.1 that there exists 
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a descendant Z2 of W2 such that {vi,zi) ==k (v2,Z2). By the induction 
hypothesis, (v2,Z2) € f{D). Hence, (^2,^2) G e{D). 

3 . e := ch>m(/), with m < k and / satisfying Proposition 5.2 Assume 
that (vi,ivi) S ch>m(/)(A*). Hence, vi = wi, which in turn implies 
V2 = W2. Let i/TTi{f){D){vi) = Yi and let i/ni{f){D){v2) = Y2. By 
assumption, \Yi\ > m. Now, let y be a child of vi in Yi or a child of V2 
in 12 ) and let z be a child of vi not in Yi or a child of V2 not in l2- By 
assumption, there exists a node y' s uch t hat (y, y’) S f{D). Now, suppose 

there exists a node z' such that 


—k 


that y 


z. Then, by Proposition 


5.1 


{y,y') ==k (z, z'). But then, by the induction hypothesis, (z, z') S f{D), 
contrary to our assumptions. We may therefore conclude that y z. 
Since furthermore Vi U2, it follows that, for all yi G Yi, there exists 

2/2 G Y2 such that yi 2/2) and vice versa. Hence, for some n > 1 , we 

can write Yi = Yu U ... U Yi„ and Y2 = Y21 U ... U Y2„ such that 

(a) Yii,...,Yi„ are maximal sets of mutually downward-fc-equivalent 
children of vi, and are hence pairwise disjoint; 

(b) Y2 i,...,Y 2„ are maximal sets of mutually downward-fc-equivalent 
children of V2, and are hence pairwise disjoint; and 

(c) for all / = 1 ,..., n, each node of Yu is downward-fc-equivalent to each 
node of Y2i. 


If, for some i, |Yii| > k, it follows from vi =J; V2 that |Y2i| > k, and, 
hence, that IY2I > fc > m. If, on the other hand, for all i = l,...,n, 
\Yu\ < k, it follows from vi =* V2 that |Yii| = |Y2i|, and, hence, that 
|Yi| = IY2I. Since |Yi| > m, it follows that, also in this case, IY2I > m. 
We may thus conclude that, in all cases, IY2I > m, and, hence, that 
iv2,V2) e ch>mif)iD) = e{D). 

4 . e := ei U 62, with ei and 62 satisfying Proposition | 5 . 2 [ Assume that 
(uijWi) € e{D). Then, (?;i,r(;i) G e.i[D) or (z)i,iyi) G 62(0). Without 
loss of generality, assume the former. Then, by the induction hypothesis, 
{V2,W2) G ei{D). Hence, {v2,W2) G e{D). 

5 . e := 61 n 62, with 61 and 62 satisfying Proposition | 5 . 2 [ Assume that 

(ui,ici) G e{D). Then, (ui,ici) G ei{D) and G 62(0). It follows 

by the induction hypothesis that (^2,1^2) G ei{D) and (^2,162) G ^2{D). 
Hence, {v2,W2) G e{D). 

6. 6 := 61—62, with 61 and 62 satisfying Proposition | 5 . 2 [ Assume that 
(vi,wi) G e(£>). Then (vi,wi) G ei(D) and (r!i,r(;i) ^ 62(1?). By the 
induction hypothesis, (v2,W2) G ei(D) and (v2,W2) ^ e2{D). (Indeed, 
if (v2,W2) G 62(11), then, again by the induction hypothesis, (?;i,r(;i) G 
62(11), a contradiction.) Hence, (v2,W2) G 6 ( 11 ). 


□ 


Corollary 5.3. Let k > 1, and let E be the set of all nonbasie operations in 
except for upward navigation (‘^”), second projection (“7^2”), inverse 
(“_-i selection on at least m children (“ch>rni-) ”) for m > k. Let e be an 
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expression in X{E). Let D = (V, Ed, r, A) be a document, let vi and V2 be nodes 
of D such that vi V2 and let wi be a descendant ofvi. If G e{D), 

then there exists a descendant W2 of V2 such that {v2,W2) G e(iA). 


5.1 


Proof. By Lemma 
{v2,W2). By Proposition 5.2 it now follows that (112,^2) G e{D). 


there exists a descendant W2 of V2 such that {vi, wi) ==k 

□ 


Corollary 5.4. Let k > 1 , and let E be a set of nonbasic operations in Ta- 
ble^not containing upward navigation (“t”), second projection (“1^2”), in¬ 
verse or selection on at least m children satisfying some condition 

(“ch>ra{-)”) for m > k. Consider the language X{E) or C{E). Let D = 
{V,Ed,r,X) be a document, and let vi and V2 be nodes of D. If vi =* V2, 
then z;i =exp ^2 • 


Proof. Let e be an expression in the language under consideration such that 
e{D){vi) 7^ 0 . Hence, there exists a descendant wi of vi such that G 

e{D). Notice that e is also an expression in the language considered in Corol¬ 
lary [ 5 ^ Hence, there exists a descendant W2 of V2 such that (^2,1^2) S e{D), 
so e{D)[v2) ^ 0 . By symmetry, the converse also holds. We may thus conclude 
that vi =exp 1J2. □ 


We may thus conclude that downward-fc-equivalence is a sufficient condition 
for expression-equivalence under a strictly downward language provided ch>,^ 
cannot be expressed for m > k. 

Even more. Corollary 5.4 does no longer hold if this restriction is removed, 
as shown by the following counterexample. 


Example 5.5. C onsid er again the example document in Figure We estab¬ 


lished in Example 4.4 that V2 =]; U3, but V2 V3. In the language df(ch>2). 


clearly V2 ^exp V3, as ch>2(£:)(D)(w2) = 0, while ch>2(e)(L>)(u3) 7^ 


5 . 2 . Necessary conditions for expression equivalence 

We now explore requirements on the set of nonbasic operations expressible 
in the language under which downward-fc-equi valence (fc > 1) is a necessary 
condition for expression-equivalence. As we have endeavored to make as few 
assumptions as possible. Proposition ^ also holds for a class of languages that 
are not (strictly) downward. 


Proposition 5.6. Let k>\, and let E be a set of nonbasic operations contain¬ 
ing set difference Consider the language X{E) orC{E). Assume that, in 

this language, first projection can be expressed, as well as selection on at 

least m children satisfying some condition (“ch>rni-) ”), for allm = 1 ,... ,k. Let 
D = {V, Ed, r, A) be a document, and let vi and V2 he nodes of D. Ifvi =exp V2, 
then vi V2. 


Proof. Since expression-equivalence in the context of X{E) implies expression- 
equivalence in the context of C{E), we may assume without loss of generality 
that the language under consideration is C{E). To prove Proposition 5.6 it 
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suffices to show that expression-equivalence (“=exp” 
Proposition | 4 . 6 [ 


satisfies the conditions of 


1 . If vi =exp then A(ui) = A(u2), for, otherwise, A(ui)(I?)(wi) ^ 0 , while 
\{vi){D){v2) = 0 , a contradiction. 

2. If vi =exp and Vi is not a leaf, then V2 is not a leaf either, for, oth¬ 
erwise, ch>i(£)(iA)(ui) 7^ 0 , while chi{e){D){v2) = 0 , a contradiction. 
Let wi be a child of vi, and let w^, - ■ ■ ,W2 be all children of W2- Sup¬ 
pose for the sake of contradiction that, for all z = 1 ,... ,n, wi ^exp W2- 

in C{E) 


there exists an expression 


such 


Then, by Proposition 4.2 
that ei{D){wi) ^ 0 and ei{D)(w2) = 0 , for all i = l,...,n. Now, 
let e := 7ri(ei) n ... n 7ri(e„), which can be expressed in C(i?)[^ Then, 
ch>i(e)(iA)(i;i) ^ 0 while ch>i(e)(£))(z;2) = 0, contradicting vi =exp ^'2• 
Hence, there does exist a child W2 of V2 such that wi =exp W2- Of course, 
the same also goes with the roles of vi and V2 reversed. 

3 . Finally, let vi and V2 be non-leaf nodes such that vi =exp V2, and let 
wi and W2 be children of vi and V2, respectively, such that wi =exp W2- 
For i = 1 , 2 , let Wi be the set of all siblings of Wi (including Wi itself) 
that are expression-equivalent to Wi. As in the previous item, we can 
construct an expression e in C{E) such that e{D){wi) ^ 0 (and hence 
e{D){w) ^ 0 for each node w in zii or W2) and e{D){w) = 0 for each 
sibling of wi not in wi and for each sibling of W2 not in W2- For the 
sake of contradiction, assume that min(|zi;i|, A:) 7^ min(|?Zi2|, A:). With¬ 
out loss of generality, assume that min(|?Zii|, A:) < min(|t(;2|. A:). Hence, 
mindrciI, A;) = |r(;i|. Let m := min(|r(}2|. A:). Then, ch>m(e)(iA)(ui) = 0 , 
while ch>m(e)(iA)(u2) 7^ 0 , contradicting vi =exp V2- We may thus con¬ 
clude that mindicil, k) = min(|r(;2|, k). 

□ 


Notice that the languages satisfying the statement of Proposition | 5 . 6 | need 
not contain any navigation operations (“),” or “f”). Of course, in the context 
of this Section, we are interested in languages in which downward navigation 


(“I”) is possible. Specializing Proposition 5.6 to this case, we may thus conclude 


that downward-A:-equivalence is a necessary condition for expression-equivalence 
under a strictly downward language containing first projection (“tti”) and set 
difference (“—”); provided selection on at least m children satisfying some con¬ 
dition (“ch>m”) for all m = 1 ,..., A; can be expressed. 

5 . 3 . Characterization of expression equivalence 

The languages containing downward navigation (“j,”) and satisfying both 
Corollary | 5 . 4 | of Subsection | 5 . 1 | and Proposition | 5 . 6 | of Subsection | 5 . 2 | are 
A’( 4 ., 7 ri,ch>i(.),... ,ch>fe(.),-) and C(i, tti, ch>i(.),..., ch>fc(.),-). We call 


^Let /i and /2 be expressions in C{E) such that fi(D) C £(D) and f2{D) C e{D). Then, 
fl n /2 can be expressed in C{E) as 7ri(£ — rrije — /i) U 7ri(e — / 2 )). 
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these languages the strictly downward XPath algebra with counting up to k and 
the strictly downward core XPath algebra with counting up to k, respectively. 
Combining the aforementioned results, we get the following. 


Theorem 5.7. Let k > 1 , and consider the strictly downward (core) XPath 
algebra with counting up to k. Let D = (V, Ed,r, X) be a document, and let vi 
and V2 be nodes of D. Then Vi =exp V2, if and only if Vi =* V2- 


A special case arises when fc = 1 , since selection on at least one child satisfy¬ 
ing some condition (“ch>i(.)”) can be expressed in terms of the other operations 
required by Theorem | 5 . 7 [ by Proposition 2.4 The languages we then obtain, 


TTi, —) and C{f, tti, —), are called the strictly downward XPath algebra and 
the strictly downward core XPath algebra, respectively. We have the following. 


Corollary 5.8. Consider the strictly downward (core) XPath algebra. Let D = 
(V, Ed,r, X) be a doeument, and let vi and V2 be nodes of D. Then vi =exp V2, 
if and only if Vi = j V2 ■ 


5.4. Characterization of navigational expressiveness 

We shall now investigate the expressiveness of strictly downward languages at 
the document level. In other words, we shall address the question whether, given 
a document, we can characterize when a set of pairs of nodes of that document 
is the result of some query in the language under consideration applied to that 
document. Such type of results are often referred to as BP-characterizations, 
after Bancilhon and Paredaens [ 53 ] who first proved such results for Codd’s 
relational calculus and algebra, respectively (cf. [ 53 ] 1 . 

We start by proving a converse to Proposition | 5 . 2 | 


Proposition 5.9. Let k > 1 , and let E be a set of nonbasic operations con¬ 
taining downward navigation (“f”) and set difference (“—”). Consider the lan¬ 
guage X(E) or C{E). Assume that, in this language, first projection (“tti”) 
can be expressed, as well as selection on at least m children satisfying some 
condition (“ch^rni-)”), for all m = 1 ,... ,k. Let D = (V, Ed,r, X) be a docu¬ 
ment, and let vi, wi, V2, and W2 be nodes of D such that wi is a descendant 
of vi and W2 is a descendant of V2. Assume furthermore that, for each expres¬ 
sion e in the language, (vi,wi) G e(D) if and only if (v2,W2) G e(I?). Then 

{vi,Wi) ==k {V2,W2). 


Proof. First notice that, by assumption, (^2,^2) G sig{vi,Wi){D), and vice 
versa. Hence, (ui,^!) = (v2,W2)- Let yi be a node on the path from vi to wi, 
and let y2 be the corresponding node on the path from V2 to W2. By construc¬ 
tion, (ui,yi) = (vi,y2) and (yi,wi) = (2/2,^2). Now, let / be any expression 
in the language such that f{D){yi) 7^ 0 . Then, {yi,yi) G 7 ri(/)(D). Let 
e := sig(ui, j/i)/ 7 ri(/)/sig(?/i,'u;i). By construction, {vi,Wi) G e{D). Hence, by 
assumption, (^2,^2) G e{D), which implies (2/2, J/2) G TT2{f){D) or f{D){y2) 7^ 0 . 
The same holds vice versa , and we may thus conclude that 2/1 =exp 2/2) and, 
hence, by Proposition 
iv2,W2). 


5.6 


2/1 =f 2/2- We may thus conclude that {vi,wi) 

□ 
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Combining Propositions 5.2 and 5.9 


we obtain the following. 


Corollary 5.10. Let k > 1 , and consider the strictly downward (core) XPath 
algebra with counting up to k. Let D — (V,Ed,r,X) be a document, and let vi, 
wi, V2, and W2 be nodes of D such that wi is a descendant of vi and W2 is a 
descendant of V2 ■ Then, the property that, for each expression e in the language 
under consideration, (vi,wi) £ e(L)) if and only if {v2,W2) £ e.(D) is equivalent 
to the property (rii,wi) ==fc {v2,W2)- 

In order to state our first BP-result, we need the following two lemmas. 


Lemma 5.11. Let k>\. Let D = (V, Ed,r, X) be a document, and let vi be a 
node of D. There exists an expression in the strictly downward core XPath 
algebra with counting up to k such that, for each node V2 of D, ey,^{D){v2) 0 

if and only if Vi =J; V2 ■ 


Proof. Let w be any node of D such that vi w. By Theorem 5 . 7 l Vl 


W. 


By Proposition 4.2 there exists an expression fvi,w in the strictly downward 
core XPath algebra with counting up to k such that fvi,w{D){vi) ^ 0 and 
fvi,w{D){w) = 0 . Now consider the expression 


:= TTi 


n 


'^1 ,' w ) 


\w£V Sz tii^J 


which is also in the strictly downward core XPath algebra with counting up to k. 
By construction, ey^{D){vi) ^ 0. Now consider a node V2 of D. If vi V2, 
'^1 =exp V2. Hence, by definition, f- 0. If, 


5.7 


then, by Theorem 
the other hand, vi ^ 


i ^ 2 , 


then, by construction, P)(U 2 )=I 


on 

□ 


Lemma 5.12. Let k > 1 . Let D = (V, Ed,r,X) be a document, and let vi 
and wi be nodes of D such that wi is a descendant of vi. There exists an 
expression in the strictly downward core XPath algebra with counting 

up to k such that, for all nodes V2 and W2 of D with W2 a descendant of V2, 
(v2,W2} £ if and only if {vi,wi) ==k {v2,W2). 

Proof. From Lemma | 5 . 11 [ we know that, for node yi of D, there exists an 
expression in the strictly downward core XPath algebra with counting up 
to k such that, for each node 1/2 of D, ey^{D){y2) ^ 0 if and only if yi ?/2- 
Now, let Vl and lUi be nodes of D such that Wi is a descendant of Vi, and let 
Vl = yii ,. •., yin = wi be the path from vi to wi in D. Define 


TTl (Cyj J )/ {Cy^^) I ■ ■ ■ (Oyin ) 


which is also in the strictly downward core XPath algebra with counting up 
to k. By construction, (uijWi) £ ey^^y,^{D). Let V2 and W2 be nodes of D such 
that W2 is a descendant of V2. If {vi,wi) ==k (v2,W2), then, by Corollary 
{v2,W2) £ Conversely, if {v2,W2) £ then, by construction 


5.10 
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(til, wi) = (v2,W2)- Thus, let V2 = y2i, ■ ■ ■, y2n = W2 be the path from V2 to W2 
in D. Again by construction, it follows that, for j = 1 ,..., n, {D){y 2 j) + 0 , 
or, equivalently, that y\j =J; y2j- Hence, (z;i,?ci) {v2,W2)- □ 

We are now ready to state the actnal result. 


Theorem 5 . 13 . Let k > 1 . Let D = (V, Ed,r, X) be a document, and let 
R CV X V. Then, there exists an expression e in the strictly downward (core) 
XPath algebra with counting up to k such that e{D) = R if and only if, 

1 . for all v,w gV, {v,w) G R implies w is a descendant of v; and, 

2 . for all vi,wi,V2,W2 G V with wi a descendant ofvi, W2 a descendant of 
V2, and (vi,wi) ==k (v2,W2), (vi,wi) G R implies {v2,W2) G R. 

Proof. To see the “only if,” it suffices to notice that the first condition fol¬ 
lows from the downward character of the language, and the second from Corol¬ 
lary The remainder of the proof concerns the “if.” From Lemma [ 5 . 12 [ we 
know that, for all nodes Vi and Wi of D such that Wi is a descendant of Vi, 
there exists an expression Cy^^uii in such that, for all nodes V2 and W2 of 

D, (v2,W2) G ey^^im(D) if and only if (vi,wi) ==k (v2,W2). Now consider the 
expression 

C . _ J Cy^ ^yy^ . 

{vi,wi)eiR 

This expression, which is well defined becanse (ni,uii) G R by assnmption 
implies that wi is a descendant of vi, is also in C{E) (and hence also in X{E)). 
It remains to show that e{D) = R. Clearly, R C e{D). We prove the reverse 
inclusion. Thereto, let V2 and W2 be nodes such that (v2,W2) G e{D). By 
construction, there exist nodes vi and wi in D such that wi is a descendant 
of vi and (v2,W2) G eyy^y,y{D). Hence, (ui,wi) ==& (v2,W2). Bnt then, by 
assumption, also {v2,W2) G R. So, e{D) <G R. □ 


As before, we can specialize Theorem 5.13 to the strictly downward (core) 
XPath algebra. 


Corollary 5 . 14 . Let D = {V, Ed,r, X) be a document, and let R C V x V. 
There exists an expression e in the strictly downward (core) XPath algebra such 
that e{D) = R if and only if, 

1 . for all v,w GV, {v,w) G R implies w is a descendant of v; 

2 . for all vi,wi,V2,W2 G V with wi a descendant ofvi, W2 a descendant of 
V2, and {vi,wi) (v2,W2), (z;!,?^!) G R implies (v2,W2) G R. 

We can also recast Theorem | 5 . 13 | in terms of node-level navigation. 

Theorem 5 . 15 . Let k > 1 . Let D = (V, Ed,r, X) be a document, let v be a 
node of D, and let W C V. Then there exists an expression e in the strictly 
downward (core) XPath algebra with counting up to k such that e{D){v) = W 
if and only if all nodes ofW are descendants of v, and, for all Wi,W2 G W with 
{v,wi) ==fc {v,W2), wi gW implies W2 G W. 
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Proof. Only if. Let e be an expression in the language under consideration such 
that e{D){v) = W. Let wi,W2 G be descendants of v with {v,wi) =* {v,W2) 


and assume that wi G W = e{D)(v). Hence, (f, wi) G e{D). By Corollary 5.10 
{v,W2) G e{D). Hence, W2 G e{D){v) = W. 

If. Let W C V satisfy the property that all nodes of W are descendants 
of V, and, for all wi,W2 G V with wi W2, wi G W implies W2 G W. Let 
i? := {(t!',w;2) I there exists wi G W such that (^^,^1) ==k (v',r(;2)}- Clearly, 
R satisfies the properties of Theorem | 5 . 13 | Hence, there exists an expression e 
in the language under consideration such that R = e{D). Clearly, W C e(D)(v). 
We prove the reverse inclusion. Therefore, let W2 G e(D)(v), i.e., (u, 1^2) G R. 
Then there exists wi G R such that (v,wi) ==k (0,102). By the property that 
W satisfies, W2 G W. Hence, e{D){v) C W, and, therefore, e(D)(v) = W. □ 


Again, we can specialize Theorem 5.15 to the strictly downward (core) XPath 
algebra. 


Corollary 5.16. Let D = (V, Ed,r, X) be a document, let v be a node of D, 
and let W C H. Then there exists an expression e in the strictly downward 
(core) XPath algebra such that e(D)(v) = W if and only if all nodes of W are 
descendants of v, and, for all nodes wi and W2 of D with (v,wi) ==i (v,W2), 
Wi G W implies W2 GW. 

A special case of Theorem | 5 . 16 | is when we are only interested in navigation 
from the root. 


Theorem 5.17. Let k > 1 . Let D = (V, Ed,r, X) be a document, and let 
W C V. Then there exists an expression e in the strictly downward (core) 
XPath algebra with counting up to k such that e(D)(r) = W if and only if, for 
all nodes wi and W2 of D with wi =| W2, wi gW implies W2 G W. 


Proof. From Theorem | 5 . 15 [ it immediately follows that there exists an expres¬ 
sion e in the language under consideration such that e(D)(r) = IT if and only 
if, for wi,W2 G V with {r,wi) —=k (r,W2), wi G W implies W2 G W. By 

(r, wi) —=k (r, UI2) is equivalent to wi =| W2. 


Proposition 


4.17 


□ 


The specialization of Theorem | 5 . 17 | to the case of the strictly downward 
(core) XPath algebra is as follows. 


Corollary 5.18. Let D = (V, Ed,r, X) be a document, and let W C V. Then 
there exists an expression e in the strictly downward (core) XPath algebra such 
that e(D)(r) = W if and only if, for all nodes Wi and W2 of D with Wi =| W2, 
wi G W implies W2 G W. 


To conclude this section, we observe that none of the characterization re¬ 
sults above distinguish between the language X{E) and the corresponding core 
language C(E). This is not surprising, as, for all downward languages, they 
have the same expressive power, not only at the navigational level for a given 
document, but also at the level of queries, i.e., for each expression e in X(E), 
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there exists an equivalent expression e' in the corresponding core language C(iii), 
meaning that, for each document D, e{D) = e'{D). Thereto, we prove a slightly 
stronger result. 

Theorem 5.19. Let E be a set of nonbasic operations containing downward 
navigation (“f”) and first projection (“tti”), and not containing upward navi¬ 
gation (“t”), and inverse Let e be an expression in the language under 

consideration. With the exception of intersection (“Ti”) and set difference 
operations used as operands in boolean combinations of subexpressions of the lan¬ 
guage within a first projection or conditional operation, all intersection and set 
difference operations can be eliminated, to the extent that these operations occur 
in the language under consideration. 

Proof. The proof goes by structural induction. Therefore, consider the expres¬ 
sion Cl n 62, respectively, ei — 62 (to the extent these operations occur in the 
language under consideration), where ei and 62 are expressions not containing 
eliminable intersection and set difference operations. For z = 1,2, we may write 

f fj ^ixj ■ ■ ■ / / \-l ^irii i 

where, for j = 0 ,... ,ni, Cij is an expression in C{E) with the property that, 
for each document D, Cij{D) C e{D). From here on, we consider both cases 
separately. 

1. Lntersection. Clearly, if ni 712, then, for each document D, 61062(0) = 
0 = 0(13). In the other case, let n := ni = 712. For j = 0, ...,n, let 
Cj := 7ri(cij n C2j), which is an expression of C(E), equivalent to cij n C2j. 
Let 

e := co/i/ci/i/ ... lilcn-ililcn- 

A straightforward set-theoretical argument reveals that, for each docu¬ 
ment D, e'(D) = 61 n 62(13). 

2. Difference. Clearly, if ni ^ n2, then, for each document D, ci — 62(0) = 
61 (Z3). In the other case, let n := ni =712. For j = 0 ,... ,n, let e' be ei 
in which cij is replaced by TTi(cij — C2j), which is an expression of C(E), 
equivalent to cij — C2j ■ Let 

e' = e'o U e'l U ... U e^.i U e'^. 

which is also in C(E). A straightforward set-theoretical argument reveals 
that, for each document D, e'{D) = ei — 62(0). 

□ 

Corollary 5.20. Let E be a set of nonbasic operations containing downward 
navigation (“f”) and first projection (“tti”), and not containing upward navi¬ 
gation (“t”), and inverse Then, for each expression e in X(E), there 

exists an expression e! in C(E) such that, for each document D, e(D) = e'(D). 

By Theorem |5.19[ we may even disallow set difference or intersection opera¬ 
tions (to the extent they occur in the language under consideration) except those 
used as operands of boolean combinations of subexpressions inside a projection 
operation without loosing expressive power. 
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5 . 5 . Strictly downward languages not containing set difference 

So far, the characterizations of strictly downward languages involved only 
languages containing the set difference operator. One could, therefore, wonder 
if it is possible to provide similar characterizations for languages not containing 
set difference. However, the absence of set difference and the logical negation 
that is inherently embedded in it has as a side effect that it is no longer always 
possible to exploit equivalences or derive them. 


5 . 5 . 1 . Weaker notions of downward and two-way distinguishability 

Therefore, one would like to consider an asymmetric version of downward k- 
equivalence, say “downward fc-relatedness,” which, for the appropriate language 
could correspond to expression relatedness. For fc = 1 , such an approach could 
lead to the following definitions. 


Definition 5.21. Let D = {V, Ed,r, X) be a document, and let Vi,V2 G V. 
Then, 

1 . vi and V2 are downward-related, denoted vi V2, if 

(a) X{vi) = X{V2); and 

(b) for each child wi of vi , there exists a child W2 of V2 such that wi 
W2. 

2 . vi and V2 are weakly downward-equivalent, denoted vi V2, if vi V2 
and V2 vi. 

Obviously, downward 1 -equivalence implies weak downward equivalence. The 
converse, however, is not true, as illustrated by the following, simple example. 

Example 5.22. Consider the document in Figure]^ Labels have been omitted, 
because they are not relevant in this discussion. (We assume all nodes have the 
same label.) Obviously, Xi =]; X2, hence Xi X2. In particular, xi X2 
and X2 xi. Also, yi X2, as the second condition to be verified is voidly 
satisfied in this case. We may thus conclude that vi V2. However, vi V2, 
as there is no child of V2 that is downward 1-equivalent to j/i. 


Notice that, in Example | 5 . 22 [ there is even no child of V2 that is weakly 
downward equivalent to yi\ Therefore, we shall not even attempt to generalize 
Definition | 5 . 21 | to the case where A: > 1 , as there is no straightforward way to 


adapt the third condition of Definition 4.3 


We conclude this digression on alternatives for downward 1 -equivalence by 
providing analogue alternatives for 1-equivalence. 


Definition 5.23. Let D = {V, Ed,r, X) be a document, and let vi,V2 G V. 
Then, 

1 . vi and V2 are related, denoted vi V2, if 

(a) vi V2-, 

(b) vi is the root if and only if V2 is the root; and 
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Figure 4: Document of Example|5.22| 


Table 4: Distinguishability notions of Section|5.5.1| 


distinguishability notion notation defined in 

downward-related Definition 

weakly-downward-equivalent Definition 

related Definition 

weakly-equivalent Definition 

5.21 

5.21 

5.23 

5.23 


(c) if vi and V2 are not the root, and ui and U2 are the parents of vi and 
V2, respectively, then ui U2- 

2 . vi and V2 are weakly equivalent, denoted vi V2, if fi V2 and V2 vi. 


Example 5.24. Consider again the document in FigureObserve that vi 
V2- Furthermore, t/i X2, but not the other way arounm 


Table summarizes all of the distinguishability notions presented in this 
section. 

The following analogue of Proposition | 4.17 is straightforward. 


Proposition 5.25. Let D = {V, Ed,r, X) be a document, and let vi,V 2 € V. 
Then, 

1. Vi V2 if and only if {r,Vi) =>^ {t,V 2); and 

2 . vi V2 if and only if (r,vi) (r,V2)- 


5.5.2. Towards characterizing expression equivalence and navigational expres¬ 
siveness 

The approach we shall take here is reviewing the results in Sections 113 - 
15.31 and examine to which extent these results in the case where k = 1 allow 
replacing downward 1-equivalence by weak downward equivalence. 

We start by observing that the analogue of Lemma [ 0 ] does not hold. Indeed, 
in the example document of Example 5.22 shown in Figure l 4 l vi V2. Also, 
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there is no child of that is weakly downward equivalent to xi. Hence, there 
is no node node z for which (vi,xi) =«i {v2,z). On the other hand, we can 
restrict Lemma l 5 .II to downward relatedness: 


Lemma 5 . 26 . Let D = {V, Ed, r, A) be a document, let vi, wi, and V2 be nodes 
of D such that wi is a descendant ofvi. If vi V2, then V2 has a descendant 
W2 in D such that (uijWi) =>^ (y2,W2)- 


Proposition | 5 . 2 | relies on Lemma | 5 . 1 | to prove the inductive step for the 
first projection (“tti”). It therefore comes as no surprise that we cannot re¬ 
place downward 1 -equivalence by weak downward equivalence, there. Indeed, 
consider the expression e := 7 ri( 4 ,/(e — 7 ri( 4 ,))). In the example document of 

V2, and, hence, (ui,ui) =« (^2,^2). 


Example 

Moreover, 


5.22 


shown in Figure B Vi 


-i 


(ui, Ui) S c{D). However, (^2,^2) ^ e.{D). However, we can “save” 
Proposition | 5 . 2 | by replacing downward 1 -equivalence by downward relatedness, 
provided we omit set difference from the set of operations of the language. 


Indeed, we can then recover the proof, using Lemma 5.26 instead of Lemma | 5 . 1 [ 
(Notice that, for the induction step for set difference in the original proof, we 
must exploit equivalence in both directions to deal with the negation inherent 
to the difference operation.) In summary, we have the following. 


Lemma 5.27. Let E be the set of all nonbasic operations in Table\^ except for 
upward navigation (“t”), second projection (“1:2”), inverse selection 

on at least k children satisfying some condition (“ch>k{-)”) for k > 1 , and set 
difference (“—”)■ Let e be an expression in X{E). Let D = {V, Ed,r,X) be a 
document, let vi, wi, V2, and W2 be nodes of D such that wi is a descendant of 
Vi andw2 is a descendant ofv2- Assume furthermore that (vi,Wi) => (v2,W2). 
Then, {vi,wi) G c{D) implies {v2,W2) £ e(iA). 

Two applications of Lemma | 5 . 27 | immediately yield the following. 

Proposition 5.28. Let E be the set of all nonbasic operations in Ta6/e[^ except 
for upward navigation (‘^”), second projection (“1^2”), inverse selection 

on at least k children satisfying some condition (“ch>k{-)”) for k > 1, and set 
difference (“—”)■ Let e be an expression in X{E). Let D = {V, Ed,r, X) be a 
document, let Vi, Wi, V2, and W2 be nodes of D such that wi is a descendant of 
vi andw2 is a descendant ofv2- Assume furthermore that {vi,wi) {v2,W2). 
Then, (ui,wi) G e{D) if and only (v2,W2) G e(D). 

So, Proposition | 5 . 28 | is weaker than Proposition | 5 . 2 | in the sense that we had 
to exclude set difference, but stronger in the sense that, in return, we were able 
to replace the precondition by a weaker one. 

The analogues of Corollaries | 5 . 3 | and | 5 . 4 | are now as follows. 

Corollary 5.29. Let E be the set of all nonbasic operations in TafeZe [I| except 
for upward navigation (“t”), second projection (“1^2”), inverse selection 

on at least k children satisfying some condition (“ch>k{-)”) for k > 1 , and set 
difference (“—”)■ Let e be an expression in X{E). Let D = {V, Ed,r,X) be 
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a document, let vi and V 2 be nodes of D such that vi V 2 and let wi be a 
descendant of vi. If (vi,wi) G e.{D), then there exists a descendant W 2 of V 2 
such that (v 2 ,W 2 ) £ a[D). 

In other words, downward relatedness implies expression relatedness. 


Corollary 5.30. Let E be a set of nonbasic operations not containing upward 
navigation (“\”), second projection (“^ 2 ”), inverse selection on at least 

k children satisfying some condition (“ch>ki-)”) for k > 1, and set difference 
Consider the language X{E) orC{E). Let D = {V, Ed,r, X) be a docu¬ 
ment, and let vi and V 2 be nodes of D. If vi V 2 , then vi =exp V 2 - 


5.29 


Proof. The condition vi V 2 implies vi V 2 and V 2 wi. By Corollary 
these conditions in turn imply vi >exp V 2 and V 2 >exp vi, which together are 
equivalent to vi =exp '^ 2 - D 


We now look to necessary conditions for expression equivalence for strictly 
downward languages not containing set difference. Provided intersection (“n”) 
is available, the expressibility of set difference is used only once in the proof of 
Proposition 5.6 namely where Proposition 4.2 is invoked. We do not need this 
Proposition, however, in the following variation of Proposition |5.6| 


Lemma 5.31. Let E he a set of nonbasic operations containing first projection 
(“tti”), and intersection (“D”). Consider the language X{E) or C{E). Let 
D = (V, Ed, r, A) be a document, and let vi and V 2 he nodes of D. Ifvi >exp V 2 , 
then vi V 2 . 

Two applications of Lemma |5.31| immediately yield the following. 


Proposition 5.32. Let E be a set of nonbasic operations containing first projec¬ 
tion (“tti”), and intersection (“C”). Consider the language X{E) orC{E). Let 
D = {V, Ed, r. A) be a document, and let vi and V 2 be nodes of D. Ifvi =exp V 2 , 
then vi V 2 . 


So, Proposition 5.32 is weaker than Proposition 5.6 in the sense that the 
conclusion is replaced by a weaker one, but stronger in the sense that, in return, 
we no longer have to rely on the presence of difference. 

The languages containing downward navigation (“J.”) and satisfying both 
Corollary 5.30 and Proposition 5.32 are X{1, tti, n) and C{f, tti, n), which, more¬ 
over, are equivalent, by Corollary 5.20[ In addition, we can eliminate intersection 
operations except those used as operands of boolean combinations of subexpres¬ 
sions inside a projection operation without loosing expressive power. We call 
these languages the strictly downward positive XPath algebra and the strictly 
downward core positive XPath algebra, respectively. Combining the aforemen¬ 
tioned results, we get the following. 


Theorem 5.33. Consider the strictly downward (core) positive XPath algebra. 
Let D — (V, Ed,r, X) be a document, and let vi and V 2 be nodes of D. Then 
vi =exp V 2 if and only if vi V 2 . 
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We finally turn to the characterization of navigational expressiveness. Propo¬ 
sition [5^ and its proof, and hence also Corollary |5.10[ carry over to the current 
setting. 

Theorem 5.34. Consider the strictly downward (core) positive XPath algebra. 
Let D = (V, Ed,r, X) be a document, and let vi, Wi, V 2 , and W 2 be nodes of 
D such that wi is a descendant of v\ and W 2 is a descendant of V 2 . Then, 
the property that, for each expression e in the language under consideration, 
(c’ljWi) £ e(Z3) if and only if {v 2 ,W 2 ) £ e{D) is equivalent to the property 
{vi,wi) iv2,W2). 

To derive a BP-result for the strictly downward (core) positive XPath al¬ 
gebra, we observe that Lemmas |5.11| and |5.12| and Theorem |5.13| carry over to 
the current context, provided we replace downward 1-equivalence by downward 
relatedness. 


Lemma 5.35. Let D = {V, Ed,r, X) be a document. 

1. Let vi be a node of D. There exists an expression Cy, in the strictly 
downward (core) positive XPath algebra such that, for each node V 2 of D, 
ey^{D){v 2 ) 7^ 0 if and only if vi V 2 . 

2. Let vi and wi be a nodes of D such that wi is a descendant of vi. There 

exists an expression in the strictly downward (core) positive XPath 

algebra such that, for all nodes V 2 and W 2 of D with W 2 a descendant of 
V2, {V2,W2) £ if and only if {vi,wi) =>^ ( 1 ^ 2 , ^ 2 )- 

In the proof of the first claim, the role of Theorem |5.7| is taken over by 
Corollary |5.29| 


Theorem 5.36. Let D = {V, Ed,r, X) be a document, and let R V x V. 
Then, there exists an expression e in the strictly downward (core) positive XPath 
algebra such that e{D) = R if and only if, 


1. for all v,w GV, {v,w) £ R implies w is a descendant of v; and, 

2. for all vi,wi,V 2 ,W 2 £ V with wi a descendant ofvi, W 2 a descendant of 
V 2 , and (vi,Wi) =>^ {v 2 ,W 2 ), (vi,wi) £ R implies (^ 2 ,^ 2 ) £ R. 


The major difference between Theorems |5.13| and |5.36| is that, in the former, 
i? is a partition of maximal sets of =J|-congruent nodes, while, in the latter, R 
is merely closed under >^-congruence. 

We can also recast Theorem 5.36 in terms of node-level navigation, in much 
the same way as Theorem |5.13[ 


Theorem 5.37. Let D = (V, Ed,r, X) be a document, letv be a node of D, and 
let W C V. Then there exists an expression e in the strictly downward (core) 
positive XPath algebra such that e{D)(v) = W if and only if all nodes ofW are 
descendants of v, and, for all nodes wi and W 2 of D with (v,wi) =>^ {v,W 2 ), 
wi G W implies W 2 G W. 
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Corollary 5.38. Let D = {V, Ed,r, X) be a document, and let W CV. Then 
there exists an expression e in the strictly downward (core) positive XPath al¬ 
gebra such that e{D)(r) = W if and only if, for all nodes wi and W 2 of D with 
Wi W 2 , Wi G W implies W 2 G W. 


For the last result of this section, we relied on 


Proposition 5.25[ Q. 


6. Weakly downward languages 

We now turn to weakly downward languages: for any node v of the document 
D under consideration, all nodes in e{D)(v) are descendants of v, but there are 
possibly nodes v for which e{D){v) e{D'){v), with D' the subtree of D rooted 

at V. 


6.1. Sufficient conditions for expression-equivalence 
The key notion in Sections 


6.1 


6.3 


is =|-congruence, fc > 1, restricted to 
ancestor-descendant pairs. We hrst explore some properties of this notion. 


Lemma 6.1. Let D = (V, Ed,r, X) be a document, let vi, wi, V 2 , and W 2 he 
nodes of D such that wi is a descendant of vi and W 2 is a descendant of V 2 , 
and let k > 1. Then, (vi,wi) ==fc (^ 2 ,^ 2 ) if and only if {vi,wi) = (^ 2 ,^ 2 ) and 

Wi =1 W2. 


Proof. As the “only if” is obvious, we focus on the “if.” By Proposition 4.17 
wi =1 W 2 implies that (r, wi) ==fe (r, W 2 ). Let yi be a node on the path from vi 
to wi, and let ?/2 be the corresponding node on the path from V 2 to W 2 . By 
Proposition |4.15 


of Proposition 


4.17 


we also have that (r,yi) ==fc (r, 2 / 2 )- By another application 
we finally deduce that yi 2 / 2 - □ 


Lemma 6.2. Let D = {V, Ed,r,X) he a document, let vi and wi be nodes of D 
such that wi is a descendant ofvi, and let k>l. Then, 

1. each node V 2 of D for which vi =| V 2 has a descendant W 2 in D such that 
{vi,wi) ==fc (^ 2 ,^ 2 ); and 

2. each node W 2 of D for which wi =| W 2 has an an ancestor V 2 in D such 
that (ui,^!) ==fe (V 2 ,W 2 ). 



wi. If vi = wi, then obviously, we must choose V 2 ■= W 2 . If vi f wi, we have 
in particular that wi r, and, hence, by wi =| W 2 , that W 2 f r. Let 2/1 be the 
parent of wi and 2/2 be the parent of W 2 . By definition, 2/1 =| 2 / 2 , and, by the 
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induction hypothesis there is a node V 2 in D such that (vi,yi) ==& (^ 2 , 2 / 2 )- It 
now readily follows that (vi,wi) ==k (^ 2 ,^ 2 )- D 

We now link =|-congruence of ancestor-descendant pairs of nodes with ex- 
pressibility in weakly downward languages. 

Proposition 6.3. Let k > 1, and let E be the set of all nonhasic operations in 
except for upward navigation inverse and seleetion on 

at least m children satisfying some condition (“ch>rn{-)”) for m > k. Let e be 
an expression in X{E). Let D = (P, Ed, r, A) be a document, and let vi, wi, V 2 , 
and W 2 be nodes of D such that wi is a descendant of vi and W 2 is a descendant 
ofv 2 - Assume furthermore that (vi,wi) ==k (^ 2 ,^ 2 )- Then, (uijWi) S e[D) if 
and only if {v 2 ,W 2 ) G e(D). 

Proof. The proof goes along the same lines of the proof of Proposition |5.2[ 
Actually, since =|-congruence implies =J-congruence, almost all of the proof 
by structural induction can be used here verbatim, except, of course, for the 
inductive step for the second projection {“’^ 2 "), which we consider next. Thus, 
let e := TT 2 {f), with / satisfying Proposition |6.3[ If (ui,ryi) G TT 2 {f), then, of 
course, vi = wi as a consequence of which V 2 = W 2 . Also, there exists yi £ V 
such that {yi,vi) G fiD). By Lemma [6^ (i), there exists 2/2 G Id such that 
( 2 / 1 ,ui) iy 2 ,' 02 ). By the induction hypothesis, ( 2 / 2 ,^^ 2 ) G f{D). Hence, 

{V2,V2) &Tr2{f){D). □ 

By combining Proposition |6.3| with Lemma [6^ we can establish the follow¬ 
ing. 

Corollary 6.4. Let k > \, and let E be the set of all nonbasie operations in 
except for upward navigation inverse and seleetion on 

at least m children (“ch>mi-)”) for m > k. Let e be an expression in X{E). 
Let D = {y, Ed, r. A) be a document, let vi and Wi be nodes of D such that Wi 
is a descendant of vi and {vi,wi) G e{D). Then, 

1. each node V 2 of D for which vi =| V 2 has a descendant W 2 in D such that 
{v 2 ,W 2 ) G e{D); and 

2. each node W 2 of D for which wi =| W 2 has an an ancestor V 2 in D such 
that {v 2 , W 2 ) G e{D). 

Finally, we infer the following from Corollary [110: 

Corollary 6.5. Let k > 1, and let E be a set of nonbasie operations not con¬ 
taining upward navigation (‘^”), inverse and selection on at least m 

children satisfying some condition (“ch>m{.)”) for m > k. Consider the lan¬ 
guage X{E) or C{E). Let D = {V, Ed, r. A) be a document, and let vi and V 2 be 
nodes of D. If vi =| V 2 , then vi =exp '^ 2 - 
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6 . 2 . Necessary conditions for expression equivalence 

We now explore requirements on the set of nonbasic operations expressible 
in the language under which downward-fc-equivalence {k > 1) is a necessary 
condition for expression-equivalence. As we have endeavored to make as few 
assumptions as possible, Proposition |6.6| also holds for a class of languages that 
are not downward. 


Proposition 6.6. Let k > 1 , and let E be a set of nonbasic operations con¬ 
taining at least one navigation operation (“f” or ‘f”) and set difference 
Consider the language X{E) orC{E), and assume that, in this language, first 
and second projection (“wi ” and “1^2 ”) can be expressed, as well as selection on 
at least m children satisfying some condition (“ch>mi-) ”), for all m = 1 ,... ,k. 
Let D = {V, Ed,r,\) be a document, and let vi and V2 be nodes of D. If 
vi =exp ^'2, then vi =| V2. 


we may assume that the language under 
we have already established that 


Proof. Without loss of generality, 
consideration is C{E). In Proposition 5.6 
vi =exp ^2 implies vi V2. By induction on the length of the path from r 
to vi, we establish that, furthermore, vi =| V2. For the basis of the induction, 
consider the case that vi = r. Let d be the length of a longest path from r to a 
leaf of D (i.e., the height of the tree). We distinguish two cases: 


1 . E. Then, l'^(D){vi) ^ 0 . Hence, l‘^{D){v2) ^ 0 , which implies V2 = r. 

2 . f G E Then, 7r2(t'^)(T*)(ni) 7^ 0 . Hence, 712(f"^)(I?)(^2) 7^ 0 , which implies 
V2 = r. 


In both cases, it follows that vi =| V2. For the induction step, consider the case 
that vi r. Again, we distinguish two cases: 

1 . i G E. Then, TT2{i){D){vi) 7^ 0 , and, hence, n2{i){D){v2) 7^ 0 . So, V2 7^ r. 

2 . f G E. Then, t(F^)(i'i) 7^ 0 , and, hence, '\'{D){v2) 7^ 0 . So, V2 7^ r. 


Now, let Ml be the parent of vi and U2 be the parent of M2. We show that 
ui =exp U2. Thereto, let e be an expression in the language under consideration 
for which e{D){ui) 7^ 0 . Again, we distinguish two cases: 

\. f G E. Then, 7r2(e/4,)(Mi) 7^ 0 . Since vi =exp V2, '!T2ie/i){v2) 7^ 0 - It 
follows that e{D){u2) 7^ 0. 

2 . f G E. Then, t/e(Mi) 7^ 0 . Since mi =exp ^>2, t/e(M2) 7^ 0 . It follows that 
e{D){u2) 7^ 0 . 

By the induction hypothesis, we may now conclude that, in both cases, ui =| U2. 
Hence, also vi =| ^'2• □ 


We see that Proposition |6.6| is as well applicable to weakly downward lan¬ 
guages as to weakly upward languages (see Section 7 . 2 1. We shall see in Sec¬ 
tion | 7 . 2 | that this is no coincidence. For now, we suffice with concluding that 
fc-equivalence is a necessary condition for expression-equivalence under a weakly 
downward language containing downward navigation (“i”), both projections 
(“tti” and ‘^2”), and set difference (“—”)) provided selection on at least m chil¬ 
dren satisfying some condition (“ch>m”) for all m = 1,..., fc can be expressed. 
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6.3. Characterization of expression equivalence 

The weakly downward languages containing downward navigation (“J,”) and 
satisfying both Corollary |6.5| of Subsection |6.1| and Proposition |5.6| of Subsec¬ 
tion 15.21 are 


A’(4.,7ri,7r2,ch>i(.),... ,ch>fc(.),-) and C(4., tti, 7r2, ch>i(.),..., ch>fe(.),-), 

which, moreover, are equivalent, by Corollary |5.20[ In addition, we can elimi¬ 
nate set difference or intersection operations except those used as operands of 
boolean combinations of subexpressions inside a projection operation without 
loosing expressive power. We call these languages the weakly downward XPath 
algebra with counting up to k and the weakly downward core XPath algebra with 
counting up to k, respectively. Combining the aforementioned results, we get 
the following. 

Theorem 6.7. Let k > 1, and consider the weakly downward (core) XPath 
algebra with counting up to k. Let D = (V, Ed^r^X) be a document, and let vi 
and V 2 be nodes of D. Then vi =exp V 2 , if and only if vi =| V 2 . 

A special case arises when fc = 1, since selection on at least one child satisfy¬ 
ing some condition (“ch>i(.)”) can be expressed in terms of the other operations 
required by Theorem |6.7| by Proposition [2^ The languages we then obtain are 
called the weakly downward XPath algebra and the weakly downward core XPath 
algebra, respectively. We have the following. 

Corollary 6.8. Consider the weakly downward (core) XPath algebra. Let D = 
ly,Ed,r,X) be a document, and let vi and V 2 be nodes of D. Then vi =exp V 2 , 
if and only if vi V 2 ■ 


6 . 4 . Characterization of navigational expressiveness 
We start by proving a converse to Proposition |6.3| 


Proposition 6.9. Let k>l, and let E be a set of nonbasic operations contain¬ 
ing downward navigation (“f”) and set difference (“—”). Consider the language 
X(E) orC{E). Assume that, in this language, first and second projection (“tti” 
and “tt 2 ”) can be expressed, as well as selection on at least m children satisfying 
some condition (“ch>rni-) ”), for all m = l,...,k. Let D = (V, Ed,r, X) be a 
document, and let Vi, Wi, V 2 , and W 2 be nodes of D such that Wi is a descen¬ 
dant of vi and W 2 is a descendant of V 2 . Assume furthermore that, for each 
expression e in the language, {vi,wi) € e{D) if and only if (^ 2 ,^ 2 ) S o(D). 
Then {vi,wi) ==k {v 2 ,W 2 ). 


Proof. From Proposition 5.9 we already know that {vi,wi) ==k (^ 2 ,^ 2 ). In 


particular, (vi,wi) = {v 2 ,W 2 ). By Lemma 
or, by Proposition 


W2 


4.17 


6.1 


it suffices to prove that vi =| 
that {r,wi) ==k {r,W 2 ). In view of what we 
already know, we only need to show that (r,ui) ==fe (r,U 2 ). Since (ui,wi) G 
7 r2(sig(r, ui))/sig(ui,ri;i), it follows that also {v 2 ,W 2 ) G 7r2(sig(r, ui))/sig(wi, wi), 
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for which we readily deduce that {r,vi) = {r,V2)- Let ui be a node on the 
path from r to vi, and let U2 be the corresponding node on the path from 
r to V2- Then, (r,ui) = (r,M 2 ) and = (u2,U2). Now, let / be any 

expression in the language such that ^ 0. Then, (ui,ui) S 7ri(/)(D). 

Let e := 7r2(7ri(/)/sig(ui,ni))/sig(Mi,Wi). By construction, € e{D). 

Hence, by assumption, (^ 2 ,^ 2 ) G e(Z3), which implies {u 2 ,U 2 ) € 7ri(/)(_D) or 
/(D) (M2) 7^ 0. The same holds vice versa , and we may thus conclude that 
Ml =exp U 2 , and, hence, by Proposition 5.6 mi M 2 . We may thus conclude 
that (r,Mi) =t (r,M 2 ). □ 


Combining Propositions 6.3 and 6.9 we obtain the following. 


Corollary 6.10. Let k> 1, and consider the weakly downward (core) XPath 
algebra with counting up to k. Let D = {V,Ed,r,X) be a document, and let vi, 
Wi, V 2 , and W 2 be nodes of D such that Wi is a descendant of Vi and W 2 is a 
descendant of V 2 ■ Then, the property that, for each expression e in the language 
under consideration, {vi,wi) G e{D) if and only if {v 2 ,W 2 ) G e{D) is equivalent 
to the property (vi,wi) ==fc {v 2 ,W 2 )- 


From here on, the derivation of a BP-result for the weakly downward (core) 
XPath algebra with counting up to k follows the development in Section |5.4| 
very closely, which is why we only state the final results. 


Theorem 6.11. Let k > \. Let D = (V, Ed,r, X) be a document, and let 
R C V X V. Then, there exists an expression e in the weakly downward (core) 
XPath algebra with counting up to k such that e{D) = R if and only if, 

1. for all v,w gV, {v,w) G R implies w is a descendant ofv; and, 

2. for all vi,wi,V 2 ,W 2 G V with wi a descendant ofvi, W 2 a descendant of 
V 2 , and {vi,wi) ==fe {v 2 ,W 2 ), (mi,m;i) G R implies {v 2 ,W 2 ) G R. 

The specialization to the weakly downward (core) XPath algebra is as fol¬ 
lows. 


Corollary 6.12. Let D = {V, Ed,r, X) be a document, and let R ’G V x V. 
There exists an expression e in the weakly downward (core) XPath algebra such 
that e(D) = R if and only if, 

1. for all v,w gV, {v,w) G R implies w is a descendant ofv; and, 

2. for all Vi,Wi,V 2 ,W 2 G V with wi a descendant ofvi, W 2 a descendant of 
V 2 , and {vi,wi) ==i {v 2 ,W 2 ), {vi,wi) G R implies {v 2 ,W 2 ) G R. 

We recast Theorem |6. 11 1 and Corollary |6.12| in terms of node-level navigation. 

Theorem 6.13. Let k > 1. Let D = [V, Ed,r, X) be a document, let v be a 
node of D, and let W C V. Then there exists an expression e in the weakly 
downward (core) XPath algebra with counting up to k such that e{D){v) = W 
if and only if all nodes ofW are descendants ofv, and, for all wi,W 2 G W with 
{v,wi) ==| (v,W 2 ), wi G W implies W 2 G W. 


36 









Corollary 6.14. Let D = (V, Ed,r, X) be a document, let v be a node of D, 
and let W C V. Then there exists an expression e in the weakly downward 
(core) XPath algebra such that e{D)(v) = W if and only if all nodes of W are 
descendants of v, and, for all wi,W2 € W with (v,Wi) ==i {v,W2), Wi G W 
implies W 2 G W. 


For V = r, the condition {v,wi) ==k (v,W 2 ) reduces to wi =| W 2 , by 


Proposition |4.17| and Lemma [O] Comparing Theorem |6.13| and Corollary |6.14| 
with, respectively, Theorem |5.17 and Corollary |5.18| then immediately yields the 
following. 


Theorem 6.15. Let D = {V, Ed, r, A). 

1. for each expression e in the weakly downward (core) XPath algebra with 
counting up to k, k > 1, there exists an expression e! in the strictly down¬ 
ward (core) XPath algebra with counting up to k such that e(D){r) = 
e'{D)(r); in particular, 

2. for each expression e in the weakly downward (core) XPath algebra, there 
exists an expression e' in the strictly downward (core) XPath algebra such 
that e{D){r) = e'{D){r). 


Hence, the corresponding weakly downward and strictly downward languages 
are navigationally equivalent if navigation always starts from the root. 


6.5. Weakly downward languages not containing set difference 

To find characterizations for weakly downward languages not containing set 
difference, we can proceed in two ways: 


1. we proceed as in Section 5.5.2 for strictly downward languages without 
set difference, i.e., reviewing the results in Sections |6.1f|6.3] and examine 
to which extent these results in the case where k = 1 allow replacing 
1-equivalence by relatedness (Definition 5.231; or 

2. we start from the results in Section [5.5.2 on strictly downward languages 
without set difference and “bootstrap” them to results on weakly down¬ 
ward languages without set difference in the same way as the results on 

were 


strictly downward languages with set difference in Sections 5.1-5.3 


bootstrapped to results on weakly down ward languages with set difference 
in Sections [6.1116.31 


Of course, both approaches lead to the same results. As the necessary inter¬ 
mediate lemmas and all the proofs can readily be deduced in one of the two 
ways described above, we limit ourselves to giving the main results. Only one 
technical subtlety deserves mentioning here: despite the absence of difference, 
both the property that a node is the root and the property that a node is not 
the root can be expressed, the latter using second projection. For more details, 


we refer to the proof of Proposition 6.6 


Concretely, the language for which we provide characterizations in this Sec¬ 
tion, are ^(4 ,, tti, 7 r2, C) and C(4 ,, tti, 7 r2, n), which , moreover, are equivalent, by 
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Corollary 5.20[ We call these languages the weakly downward positive XPath 
algebra and the weakly downward core positive XPath algebra, respectively. In 
addition, we can eliminate intersection altogether. This follows from an earlier 
result by some of the present authors m- Although this result was stated in 
the context of languages that allow both downward and upward navigation, a 
careful examination of the elimination algorithm reveals that the results still 
hold in the absence of upward navigation. Thus, we have the following. 


Proposition 6.16. The weakly downward positive XPath algebra and the weakly 
downward core positive XPath algebra are both equivalent to Ad, tti, 7 r2). 

We now summarize the characterization results. 


Theorem 6.17. Consider the weakly downward (core) positive XPath algebra. 
Let D = (V, Ed,r, X) be a document, and let vi and V 2 be nodes of D. Then 
Vi =exp V 2 if and only if Vi V 2 ■ 

Theorem 6.18. Consider the weakly downward (core) positive XPath algebra. 
Let D = (y, Ed,r, X) be a document, and let vi, Wi, V 2 , and W 2 be nodes of 
D such that wi is a descendant of vi and W 2 is a descendant of V 2 . Then, 
the property that, for each expression e in the language under consideration, 
{vi,wi) € e{D) if and only if {v 2 ,W 2 ) S e{D) is equivalent to the property 

{vi,Wi) {V2,W2). 

Theorem 6.19. Let D = (V, Ed, r, X) be a document, and let R f- V xV. Then, 
there exists an expression e in the weakly downward (core) positive XPath algebra 
such that e(D) = R if and only if, 

1. for all v,w GV, {v,w) G R implies w is a descendant of v; and, 

2. for all vi,wi,V 2 ,W 2 G V with wi a descendant ofvi, W 2 a descendant of 
V 2 , and {vi,wi) =>^ (^ 2 ,^ 2 ), (uijici) G R implies {v 2 ,W 2 ) G R. 

Corollary 6.20. Let D = (y, Ed, r, X) be a document, let v be a node of D, and 
let W C V. Then there exists an expression e in the weakly downward (core) 
positive XPath algebra such that e{D){v) = W if and only if all nodes ofW are 
descendants of v, and, for all nodes wi and W 2 of D with {v,wi) —>^ {v,W 2 ), 
wi G W implies W 2 GW. 

Corollary 6.21. Let D = {V, Ed,r, X) be a document, and let W C V. Then 
there exists an expression e in the weakly downward (core) positive XPath algebra 
such that e{D){r) = W if and only if, for all nodes Wi and W 2 of D with 
Wi W 2 , Wi G W implies W 2 G W. 

Hence, the weakly downward positive (core) XPath algebra and the strictly 
downward positive (core) XPath algebra are navigationally equivalent if navi¬ 
gation always starts from the root. 
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7. Upward languages 

In analogy to downward languages, we call a language upward if, for any 
expression in that language, and for any node v of the document D under 
consideration, all nodes in e{D)(y) are ancestors of v. If in an addition, it is 
always the case that e{D){v) = e{D'), where D' is the subtree of D obtained 
by removing from D all strict descendants of w, we call the language strictly 
upward. Upward languages that are not strictly upward will be called weakly 
upward. 

For E a set of nonbasic operations of Table or C{E) is upward if it 

does not contain downward navigation (“J,”), and inverse Additionally, 

strictly upward languages do not contain second projection {“ 7 ^ 2 ”) and counting 
operations (“ch>fc(.)”). 

Of course, there is a distinct asymmetry between strictly upward languages 
and strictly downward languages: while a node can have an arbitrary number 
of children, it has at most one parent, making the analysis of strictly upward 
languages much easier than the analysis of downward languages. We shall see, 
however, that this asymmetry disappears for weakly upward languages versus 
weakly downward languages. 

Finally, we observe that the analogues of Theorem 1 5.1 9| and Corollary |5.20| 
still hold for upward languages: set difference and intersection (“0”) can 

be eliminated, unless they are used as operations in a Boolean combination of 
subexpressions of the language within a projection. Hence, an upward language 
and its corresponding core language coincide. 

7.1. Strictly upward languages 

The languages we consider here, are A’(t,7ri,—) and C(t,7ri,—), which are 
equivalent, and A’(t, 7ri,n) and C(t,7ri,n), which are also equivalent. We refer 
to the former as the strictly upward (core) XPath algebra and the strictly upward 
(core) positive XPath algebra, respectively. As the characterization results for 
these languages are easy to derive along the lines set out in Section we merely 
summarize the results. 

Theorem 7.1. Consider the strictly upward (core) (positive) XPath algebra. 
Let D = (V, Ed,r, X) be a document, and let vi and V 2 be nodes of D. Then 
vi =exp V 2 , if and only if vi V 2 . 

Theorem 7.2. Consider the strictly upward (core) (positive) XPath algebra. 
Let D = {y, Ed,r,X) be a document, and let vi, wi, V2, and W2 be nodes of D 
such that wi is an ancestor ofvi and W 2 is an ancestor ofv 2 . Then, the property 
that, for each expression e in the language under consideration, S e[D) 

if and only if (y 2 ,W 2 ) G e{D) is equivalent to the property (vi,Wi) ==.^ (v 2 ,W 2 ). 

Theorem 7.3. Let D = (V, Ed, r. A) be a document, and let R C V x V. Then, 
there exists an expression e in the strictly upward (core) XPath algebra such 
that e{D) = R if and only if, 

1. for all v,w GV, {v,w) € R implies w is a ancestor of v; and. 
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2. for all vi,wi,V 2 ,W 2 £ V with wi a ancestor of vi, W 2 a ancestor of V 2 , 
and {vi,wi) ==^ (^ 2 ,^ 2 ), G R implies (v 2 ,W 2 ) G R. 

Theorem 7.4. Let D = (V, Ed, r, A) be a document, and let R C V x V. Then, 
there exists an expression e in the strictly upward (core) positive XPath algebra 
such that e{D) = R if and only if, 

1. for all v,w gV, {v,w) G R implies w is a ancestor of v; and, 

2. for all vi,wi,V 2 ,W 2 G V with wi a ancestor of vi, W 2 a ancestor of V 2 , 

and (z;i,wi) =>^ (^ 2 ,^ 2 ), G R implies (v 2 ,W 2 ) G R. 


The difference between the strictly downward (core) XPath algebra and the 
strictly downward (core) positive XPath algebra becomes only apparent in the 
BP-characterization: in Theorem |7.3[ i? is a union of equivalence classes under 
==^, whereas in Theorem 


7.4 


R is merely closed under the relation =; 


7.2. Weakly upward languages 

Weakly upward languages are closely related to weakly downward languages, 
by the following result. 


Theorem 7.5. Let E be a set of nonhasic operations not containing downward 
navigation (“i”), and inverse Let E' he the set of nonbasic operations 

obtained from E by replacing downward navigation by upward navigation (“f”), 
first projection (“tti ”) by second projection (“ 1:2 ”), and second projection by first 
projection. Then, for each expression e in X(E) (respectively, C{E)), there is an 
expression e' in X[E') (respectively, C{E')) such that and e! are equivalent 
at the level of queries, and vice versa. 


Proof. Starting from e~^, we eliminate inverse (“.“^”) using the identities in the 
proof of Proposition 2.3 and the additional identities Tri{e)~^{D) = Tr 2 {e~^){D) 
and Trf^{D) = TTi{e~^){D), for D an arbitrary documentThis elimination 
process yields the desired expression e'. □ 


Together with the fact that, in a subsumption or congruence, the order 
of the nodes in the pairs on the left- and right-hand sides may be swapped 
simultaneously (Proposition |3.5[ ([^ and @)> Theorem |7.5| has the following 
immediate consequences: 


1. Each characterization for a weakly downward language in Section Ih 
which in each instance contains both projections—yields a characteri¬ 
zation for the corresponding weakly upward language (i.e., obtained by 
substituting upward navigation for downward navigation) by replacing 
“descendant” by “ancestor”; and 


® Observe that 7ri(e)“^(D) = 7ri(e)(D) and 7r2(e)“^(D) = 7r2(e)(D) are also valid identities 
if the sole purpose was to eliminate inverse; however, these identities will not lead to the desired 
result. 
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2. Each characterization for a strictly downward language in Section 
which in each instance contains the first projection—yields a character¬ 
ization for the corresponding weakly upward language (i.e., obtained by 
substituting upward navigation for downward navigation and second for 
first projection) by replacing “descendant” by “ancestor”. 

Moreover, Theorem |7.5| gives us for free characterizations for some additional 
weakly downward languages not considered in Section]^ 

Each characterization for a strictly upward language in Section [7.1| —which 
in each instance contains the second projection—yields a characterization 
for the corresponding weakly downward language (i.e., obtained by substi¬ 
tuting downward navigation for downward navigation and first for second 
projection) by replacing “ancestor” by “descendant”. 

In view of space considerations, however, we refrain from explicitly writing down 
these new characterization results. 


8. Languages for two-way navigation 

We finally consider languages which are neither downward nor upward, i.e., 
in which navigation in both directions (“j,” and “f”) is possible. A notable 
difference in this case is that standard languages no longer always coincide 
with their associated core languages in expressive power. Below we distinguish 
languages with and without difference. In the hrst case, we discuss the standard 
languages and the core languages separately (Sections 8.1 and 8.2). In the second 
case, there is no need for this distinction (Section |8.3[). 


8.1. Standard languages with difference for two-way navigation 


First, we state analogues to Lemmas |6.1| and 6.2 for pairs of nodes that are 
not necessarily ancestor-descendant pairs. 


Lemma 8.1. Let D = (V, Ed,r, X) be a document, let vi, wi, V 2 , and W 2 
be nodes of D, and let k > 1. Then, (vi,Wi) ==& (^ 2 ,^ 2 ) if and only if 

{vi,wi) = {v 2 ,W 2 ), vi =1 V 2 , and Wi =| W 2 . 


(tOp(ui, Wi),-!;!) ==fc (tOp(w2, W2),U2 

(top(ui, wi), wi) ==j (top(u 2 , W 2 )) W 2 ). Applying Proposition 


Proof. As the “only if” is obvious, we focus on the “if.” Obviously, (wi,wi) = 
(v 2 ,W 2 ) implies that (top(ui, Wi), Wi) = (top(u 2 , W 2 )) w^ 2 )- By Lemma [6.1[ 

In the same way, we derive 

I I 0) and 

Q, yields the desired result. 


□ 


Lemma 8.2. Let D = (V, Ed, r, A) be a document, let vi and wi be nodes of D, 
and let k >2. Then, 

1. for each node V 2 of D for which vi =| V 2 there is a node W 2 in D such 
that (ui,ryi) ==fe ( 02 , W 2 ); and 
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2 . for each node W 2 of D for which wi =| W 2 there is a node V 2 in D such 


that (vi,wi) ==fc (v 2 ,W 2 ). 


Proof. We only prove Q; the proof of ([^ is completely analogous. By Lemma 


6.2 


there exists a node t 2 in D such that (top(ni, ici), ni) ==fc (^ 2 ,^ 2 ), and, 

"^l) and dil 


3.5 


hence, also that (ni,top('(;i, wi)) ==| (v 2 ,t 2 ), by Proposition 
Let yi be the child of top(r’i, Wi) on the path to Wi. Since k > 2, tnere is a cmfd 

^ ■ii_ ri n ^ f 0^ /]i_ nc n/~in -i-Vinri-l-Vi ■Fi’/'i’t-n ^ /ii_ |9| 

In 


y 2 of t 2 such that (1) yi =| y 2 and (2) y 2 is not on the path from t 2 to 


Lemma 


6.2 


, there exists a node W 2 in D such that {yi,wi) ==k (j /2 

k 


,W 2 ). 


particular, wi =| W 2 . By construction, t 2 = top(n 2 ,W 2 ), and, hence, (ni,?ni) = 
{v 2 ,W 2 ). The result now follows from Lemma [8.1| □ 


The mutual position of the nodes in the statement and the proof of Lemma [8^ 
Q, is illustrated in Figure 


top(?;i, wi) t 2 




Figure 5: Mutual position of the nodes in the statement and the proof of Lemma |8.2[ Q. 

Before we can start with establishing the relationship between =|-congruence 
and expression equivalence under languages allowing two-way navigation, we 
need one more lemma to be able to deal with the composition operator. 

Lemma 8.3. Let D = {V, Ed,r, X) be a document, let vi, wi, V 2 , and W 2 be 
nodes of D such that (vi,Wi) ==& (v 2 ,W 2 ), and let k > 3. Then, for every 

node yi of D, there exists a node t /2 of D such that {vi,yi) ==fc {v 2 ,y 2 ), and 
{yi,wi) ==fc {y 2 ,W 2 ). 

Proof. The proof is essentially a case analysis. In each case description, we as¬ 
sume implicitly that the cases that were already dealt with before are excluded. 


®To see the latter claim, observe that t 2 must have two different fc-equivalent children when 
top(ui, wi) has. 
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1. yi is on the path from vi to wi. In that case, let 2/2 be the node cor¬ 
responding to 2/1 on the path from V 2 to W 2 - The result now follows 
immediately. 

2. 2/1 is a strict descendant of Vi. By Lemma m 0. there is a (strict) 
descendant 2/2 of V 2 such that (vi, 2 / 1 ) ==fc {v 2 , 2 / 2 )- The result now follows 
immediately. 

3. 2/1 is a strict descendant of wi. Analogous to the previous case. 

4. 2/1 is a strict ancestor of top{vi,Wi). By Lemma |6.2[ there is a (strict) 
ancestor 2/2 of top(r! 2 , rc 2 ) such that (top(ni, rci), 2 / 1 ) ==j (top(?; 2 , "^ 2 ), 2 / 2 )- 


The result now follows immediately. 

5. top(vi,yi) is an internal node on the path from Vi to top{vi,Wi). 
Lemma i 


1.2 


By 

Q, there exists a node 2/2 in D such that (ui, 2 / 1 ) ==| (^ 2 , 2 / 2 )- 
Since, in this case, top( 2 / 1 , wi) = top(t!i,ici), and, therefore, an ancestor 
of vi , we may apply Proposition |3.5[ 0-0, to obtain that ( 2 / 1 , 101 ) = 
{y 2 ,W 2 )- Since, moreover, yi =| 2/2 and wi =| W 2 , the desired result now 
follows from Lemma l8Tl 

Figure [^illustrates this case and the constructions therein. 


top(2/i, wi) = top(i;i,wi) 




Figure 6: Mutual position of the nodes in Case Hjof the proof of Lemma |8.3| 


6. top(yi,wi) is an internal node on the path from top{vi,Wi) to wi. Anal¬ 
ogous to the previous case. 


7. top(vi,yi) = top{yi,wi) is a strict ancestor of top(vi,wi). By Lemma 8.2 


0 ’ 

tnis 


there exists a node 2/2 in D such that (yi,yi) ==k ( 02 , 2 / 2 )- Since, in 
tins case, top( 2 /i,roi) is a (strict) ancestor of top(oi,t(;i), and, therefore, 
an ancestor of Vi, we may apply Proposition |3.5[ 0- Q, to obtain that 
( 2 / 1 , wi) = ( 2 / 2 , 102 ). Since, moreover, 2/1 =| 2/2 and wi ^ W 2 , the desired 
result now follows from Lemma l8. II 

Figure [^illustrates this case and the constructions therein. 
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top(tii, J/i) = top(i/i, Wi) 




Figure 7: Mutual position of the nodes in Case [7] of the proof of Lemma |8.3| 


8 . top(vi,yi) = top{yi,wi) = top{vi,wi). Let zi be the child of the top 
node on the path to yi. By assumption, zi is not on the path from vi to 
wi- Since k > 3, there is a node Z 2 in D not on the path from V 2 to W 2 
such that zi =| Z 2 . (For example, in the subcase where the children of 
top(ui,rf;i) on the paths to vi, rci, and j/i, the last of which is zi, are all 
three fc-equivalent, we know that top(u 2 ,rc 2 ) must also have at least three 
children that are fc-equivalent to zi. Hence, at least one of these is not on 
the path from vi to wi.) By Lemma HQ. there exists a node 2/2 hr D 
such that ( 01 , 2 / 1 ) ==fc ( 02 , 2 / 2 )- The result now follows readily. 

Figure [^illustrates this case and the constructions therein. 

top(vi, 2/1) = top( 2 /l, Wl) = top(z)i, Wl) top(ll2, W2) 




Figure 8: Mutual position of the nodes in Case |8]of the proof of Lemma |8.3| 


We are now ready to state the analogue of Proposition |6.3| for languages 
with two-way navigation. 
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Proposition 8.4. Let k > i, and let E be the set of all nonbasic operations 
in Table^ except for selection on at least m children satisfying some condition 
(“ch>rni-) ”) for m > k. Let e be an expression in X{E). Let D = (V, Ed^ r, A) 
be a document, and let vi, Wi, V 2 , and W 2 be nodes of D. Assume furthermore 
that (vi,Wi) ==| (v 2 ,W 2 )- Then, {vi,Wi) G e{D) if and only if {v 2 ,W 2 ) G e(D). 

Proof. The proof goes along the same lines as the proofs of Propositions |6.3| 
and |5.2| The base case, for the atomic operators, remains straightforward. In 
the induction step, we must now rely on Lemma [8.3| to make the case for compo¬ 
sition To make the case for first, respectively, second projection (“tti,” 

respectively, “7r2”), we must rely on Lemma [8?^ Q, respectively, (H). The argu¬ 
ments for the counting operations (“ch>,„(.),” m < k), union (”U”), intersection 
(”n”), and set difference in the proof of Proposition |5.2| carry over to the 

present setting. Finally, the case for inverse is straightforward. □ 

As in Section |6.2[ we can in two steps infer the following result from Propo¬ 
sition 18.41 


Corollary 8.5. Let k > Z, and let E be a set of nonbasic operations not con¬ 
taining selection on at least m children satisfying some condition (“ch>m{-)”) 
for m > k. Consider the language X{E). Let D = {V, Ed,r, X) be a document, 
and let vi and V 2 be nodes of D. If vi =| V 2 , then vi =exp 02 - 


Now, notice that Proposition |6.6| of Section 6^ is also applicable to an im¬ 
portant class of languages allowing two-way navigation. The standard lan¬ 
guage for two-way navigation satisfying both Corollary |8.5| and Proposition |6.6| 
is A’(4.,t,ch>i(.),...,ch>fc(.),-)(^ We call this language the XPath algebra 
with counting up to k. Combining the aforementioned results, we obtain the 
following. 


Theorem 8.6. Let fc > 3, and consider the XPath algebra with counting up 
to k. Let D = (V, Ed,r, X) be a document, and let vi and V 2 be nodes of D. 
Then, vi =exp V 2 if and only if vi =| V 2 ■ 

By Proposition |2.4[ selection on up to three children satisfying some condi¬ 
tion (“ch>m(.),” 1 < m < 3) can be expressed in the XPath algebra. Hence, a 
special case arises for fc = 3: 

Corollary 8.7. Consider the XPath algebra. Let D — (V, Ed,r, X) be a docu¬ 
ment, and let vi and V 2 be nodes of D. Then, vi =exp V 2 if ond only if vi =| V 2 . 

We next prove a converse to Proposition |8.4| 

Proposition 8.8. Let k > 3, and consider the XPath algebra with counting 
up to k. Let D = {V, Ed,r,X) be a document, and let vi, wi, V 2 , and W 2 be 
nodes of D. Assume furthermore that, for each expression e in the language, 
{vi,wi) € e{D) if and only if {v 2 ,W 2 ) £ e{D). Then {vi,wi) —=^ {v 2 ,W 2 )- 


other operations are redundant, by Proposition 2.3 
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Proof. Since sig(tii,tyi) is an expression in the language under consideration, 
and since (wi,wi) e sig(?;i,wi), (r> 2 ,W 2 ) £ sig(i>i,wi). Similarly, (z;i,?ni) G 
sig(ti 2 , iC 2 )- We may thus conclude that (z)i,?ni) = {v 2 ,'W 2 ). Now, let / be any 
expression in the language such that f{D){vi) 7^ 0 . Then, (ui,Ui) G tti (/)(£)). 
Let e := 7 ri(/)/sig(r!i,ici). By construction, (z;i,u;i) G e{D). Hence, by as¬ 
sumption, {v2,W2) G e{D), which implies {v2,V2) G 7 ri(/)(D) or f{D){v2) 7^ 0 - 
The same holds vic e ve rsa, and we may thus conclude that vi =exp V 2 , and, 
hence, by Theorem 8.6 vi V 2 . In a similar way, we prove that wi 102 - 
By Lemma 


S.l 


we may now conclude that (vi,r(;i) ==k {v 2 ,W 2 ). 


□ 


Combining Propositions |8.4| and | 8 . 8 [ we obtain the following characteriza¬ 
tion. 


Corollary 8.9. Let k > 3, and consider the XPath algebra with counting up 
to k. Let D — {V, Ed,r,\) he a document, and let vi, wi, V 2 , and W 2 be nodes 
of D. Then, the property that, for each expression e in the language under 
consideration, {vi,wi) G e{D) if and only if {v2,W2) G e{D) is equivalent to the 
property {vi,Wi) ==fc (^ 2 ,^ 2 ). 


Using Theorem | 8 . 6 | instead of Theorem |5.7[ we can recast the proof of 
Lemma 5.11 into a proof of 


Lemma 8.10. Let k > 3. Let D = (V, Ed, r, A) be a document, and let vi be a 
node of D. There exists an expression 61 ,^ in the XPath algebra with counting 
up to k such that, for each node V 2 of D, ey„{D){v 2 ) 7 ^ 0 if and only ifvi =| V 2 . 

We can now bootstrap Lemma[8.10|to the following result. 


Lemma 8.11. Let k >3. Let D = {V, Ed,r, X) be a document, and let vi and 
Wi be a nodes of D. There exists an expression ey„^y,„ in the XPath algebra with 
counting up to k such that, for all nodes V2 and W2 of D, (v2,W2) G ey^^yj^(D) 
if and only if{vi,wi) ==& {v 2 ,W 2 ). 


Proof. From Lemma 8.10[ we know that, for node yi of D, there exists an 
expression in the XPath algebra with counting up to k such that, for each 
node 2/2 of D, ey^(I?)(?/ 2 ) 0 if and only if yi =| 2 / 2 . Now, let Vi and Wi be 

nodes of D. Let sig(vi,wi) = with u,d>0, and define 


61,1,11,1 := 7 ri(e„J/sig(ri, i(;i)/ 7 ri(e„J - t“ ^ 


where, for an expression /, we define / ^ 0. Clearly, e„i,u,i is also in the 


XPath algebra with counting up to k. Let V 2 and W 2 be nodes of D. 
pose {v2,W2) G ey^^yji{D). Then, by Proposition 
Further more, it follows that {v2,V2) G ey^{D) and (162,^2) 

Lemma 


8.10 


vi =1 V 2 and wi =| W 2 . It now follows from Lemma 


Sup- 

sig{vi,wi) = sig{v2,W2). 

G Cyj, {D}^ By 
that 


.1 


(vi,W 2 ) ==k (v 2 ,W 2 ). As (vi,wi) G e„i,u,i(il), the converse follows from Corol- 
lary |8.9| □ 
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The BP characterization results now follow readily. 

Theorem 8.12. Let k > 3. Let D = (V, Ed,r, X) be a document, and let 
R C V X V. Then, there exists an expression e in the XPath algebra with 
counting up to k such that e{D) = R if and only if, for all vi,wi,V 2 ,W 2 € V 
with {vi,Wi) ==| {v 2 ,W 2 ), {vi,Wi) G R implies {^ 2 ,^ 2 ) G R- 

The specialization to the XPath algebra is as follows. 

Corollary 8.13. Let D = {V, Ed,r, X) be a document, and let R Q V x V. 
There exists an expression e in the XPath algebra such that e(D) = R if and 
only if, for all vi,wi,V 2 ,W 2 € V with (vi,wi) ==3 (v 2 , W 2 ), (vi,wi) G R implies 
iv2,W2) G R. 

We recast Theorem |8. 12 1 and Corollary |8.13| in terms of node-level navigation. 

Theorem 8.14. Let k > 3. Let D = (V, Ed,r, X) be a document, let v be a node 
of D, and let W ffV. Then there exists an expression e in the XPath algebra 
with counting up to k such that e(D)(v) = W if and only if, for all wi,W 2 G W 
with {v,wi) ==| (y,W 2 ), wi gW implies W 2 € W. 

The specialization to the XPath algebra is as follows. 

Corollary 8.15. Let D = (V, Ed,r, X) be a document, let v be a node of D, 
and let W <G V. Then there exists an expression e in the XPath algebra such 
that e{D)(v) = W if and only if, for all wi,W 2 G W with {v,Wi) ==3 {v,W 2 ), 
wi G W implies W 2 GW. 


Finally, we consider the special case where navigation starts from the root. 
For V = r, the condition (v,wi) ==fc {v,W 2 ) reduces to wi =| W 2 , by Proposi¬ 


tion |4.17| and Lemma 
respectively, Theorem 
lowing. 


6.1[ Comparing Theorem |8.14| and Corollary |8.15| with. 


5.17| and Corollary |5.18 then immediately yields the fol- 


Theorem 8.16. Let D = {V, Ed, r, A). 

1. for each expression e in the XPath algebra with counting up to k, k > 3, 
there exists an expression e! in the strictly downward (core) XPath algebra 
with counting up to k such that e{D)lr) = e'(D){r). 

2. for each expression e in the XPath algebra, there exists an expression e! 
in the strictly downward (core) XPath algebra with counting up to 3 such 
that e{D){r) = e'{D){r). 

Theorem |8.16| extends Theorem |6.15[ When navigating from the root, the 
only thing that the full XPath algebra adds compared to using the strictly 
downward (core) XPath algebra is its ability to select on at least 2 and on at 
least 3 children satisfying some condition. 
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Figure 9: Document of Example|8.17| 


8.2. Core languages with difference for two-way navigation 

We now investigate what changes if we replace a standard language with 
difference for two-way navigation by the corresponding core language. The 
most important observation is that both languages are not equivalent, unlike in 
the cases of downward or upward navigation. 

Example 8.17. Let D = {V, Ed, r, A) be the very simple document in Figure]^ 
For every value of A: > 2[^e := f/f—e is an expression in the XPath algebra with 
counting up to k. We have that e{D) = {{v,w), (w,'!;)}. From Proposition |8.19[ 
it will follow, however, that, for every expression e' in the corresponding core 
language, {v,w) G e'{D) implies that not only {w,v) G e'{D), but also {v,v) G 
e'{D) and {w,w) G e'{D). 


We now explore which changes occur when we try to make the same reason¬ 


ing as in Section 8.1 


As Example 8.17 


_ ^suggests, there is no hope that we can express congruence 

in the core XPath algebra with counting up to /cj^for any k > 2. Therefore we 
shall have to work with subsumption instead of congruence. 

Lemma |8.1 1 still holds if we replace congruence by subsumption. We may of 
course still use Lemma [8^ (as replacing congruence by subsumption here would 
yield a weaker statement). Lemma 8.3 also survives replacing congruence by 
subsumption, except that we can then strengthen its statement, as follows. 


Lemma 8.18. Let D = {V, Ed,r, X) be a document, let vi, wi, V 2 , and W 2 be 
nodes of D such that (vi,wi) >=fc (v 2 ,W 2 ), and let k > 2. Then, for every 

node yi of D, there exists a node y 2 of D such that (vi,yi) >_fc (v 2 ,y 2 ), and 

iyi,Wi) >=fc iy2,W2). 

Proof. The only case in the proof of Lemma |8.3| where we used A: > 3 is 
Case (top(ni, j/i) = top(j/i,wi) = top(ni,iyi)) to guarantee that the path 
from top(w 2 ) W 2 ) to j /2 has no overlap with both the path from top(?; 2 , W 2 ) to V 2 
and the path from top(u 2 ,rc 2 ) to W 2 . As this is no concern anymore when we 


will not consider fc = 1, because both ch>i(.) and ch> 2 (.) can be expressed in the 
core XPath algebra, by Proposition |2.4| 

^^This is the name we give to the core language corresponding to the (standard) XPath 
algebra with counting up to k 
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consider subsumption rather than congruence, the condition k > 2 suffices to 
recast the proof of Lemma 8.3 into a proof of Lemma 8.18 □ 


Lemma 8.3 was used to complete the induction step for composition (“/”) 

If we replace Lemma |8.3| by Lemma |8.18[ we 


in the proof of Proposition |8.4| 
can also avoid making the assumption > 3 here. Thanks to the restricted use 
of difference in core languages, we can also get away with subsumption instead 
of congruence. 


Proposition 8.19. Let k > 2, and let E be the set of all nonbasie operations 
in Table^ except for selection on at least m children satisfying some condition 
(“ch>rni-) ”) for m > k. Let e be an expression in C{E). Let D = (V, Ed, r, A) 
be a document, and let v\, w\, V 2 , and W 2 be nodes of D. Assume furthermore 
that (vi,Wi) >-k (v 2 ,W 2 )- Then, (vi,Wi) € e(D) implies {v 2 ,W 2 ) S e(D). 


Proof. The proof goes along the same lines as the proof of Proposition |8.4[ 
except that, in the induction step, we need not consider the case of set difference 
However, we must consider instead the case where the expression is of 
the form e := 7ri(/) or e := T^ 2 if) with / a Boolean combination of expressions 
of C{E) satisfying the induction hypothesis. For reasons of symmetry, we only 
consider the case e := TTi{f). Without loss of generality, we may assume that 
/ is union-free. Indeed, we can always rewrite / in disjunctive normal form, 
and, for / = /i U / 2 , 7ri(/) = 7ri(/i) U 7ri(/2). If, for an expression g in 
C{E), we define g by g{D) := V x V — g{D), we can write / = /i H ... /p n 
^ n ... ^ for some p > 1 and q > 0, with fi,..., fp, gi,..., pg in C{E) and 
satisfying the induction hypothesis. In particular, if (ui,ui) S 7ri(/)(Zl){^ 
there exists a node yi in D such that ('Ci, yi) € fi{D),... ,{vi,yi) G fp{D) and 
iviTyi) ^ gi{D ),..., {vi,yi) ^ gp{D). By Lemma 8.2 there exists a node y 2 in 


D such that (vi,yi) ==k (v2,y2)- Hence, {vi,yi) >=<= (1^2,2/2) and (1^2,2/2) >=<= 

t —t —t 

(^^i,yi)- By the induction hypothesis, (v 2 ,y 2 ) G fi{D), ■ ■ ■ ,{v 2 ,y 2 ) G fp{D). 
Now, assume that, for some j, 1 < J < y, {v 2 ,y 2 ) G gj{D). Then, again by 
the induction hypothesis, {vi,yi) G gj(D), a contradiction. Hence, (r’ 2 , 2 / 2 ) ^ 
gi{D),... ,{v2,y2.) ^ 9 p{D). We may thus conclude that {v2,V2) G 7ri(/). □ 


By applying Proposition |8.19 twice, we obtain the following. 


Corollary 8.20. Let k > 2, and let E be the set of all nonbasie operations in 
Table [7| except for selection on at least m children satisfying some condition 
(“ch>rni-) ”) for m > k. Let e be an expression in C{E). Let D = (V, Ed, r. A) 
be a document, and let vi, Wi, V 2 , and W 2 be nodes of D. Assume furthermore 
that (vi,wi) ==| {v 2 ,W 2 ). Then, {vi,wi) G e{D) if and only if {v 2 ,W 2 ) G e{D). 

As in Section [6?^ we can in two steps infer the following result from Corol¬ 
lary |8]^ 


this case, = wi and V2 = W2. 
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Corollary 8.21. Let k > 2, and let E be a set of nonbasic operations not 
containing selection on at least m children satisfying some condition (“ch>rn{-) ”) 
for m > k. Consider the language C{E). Let D — {V, Ed,r, X) be a document, 
and let Vi and V 2 be nodes of D. If Vi =| V 2 , then Vi =exp ' 1 ' 2 - 

Notice that, for k > 3, Corollary |8.21| is also an immediate consequence 
of Corollary |8.5[ Because we are dealing with a weaker language, we can also 
include the case fc = 2, however. 

We already observed that Proposition |6.6| of Section 
an important class of languages allowing two-way navigation. The core language 
for two-way navigation satisfying both Corollary |8.21| and Proposition |6.6| is 
C4, t,7ri,7r2,ch>i(.),..., ch>fc(.),-)[^ We call this language the core XPath 
algebra with counting up to k. Combining the aforementioned results, we obtain 
the following. 

Theorem 8.22. Let k > 2, and consider the core XPath algebra with counting 
up to k. Let D = {V,Ed,r,X) be a document, and let vi and V 2 be nodes of D. 
Then, vi =exp '*'2 if and only if vi =| V 2 ■ 

By Proposition [ 2 ]^ selection on up to two children satisfying some condition 
(“ch>m(.),” 1 < TO < 2) can be expressed in the core XPath algebra. Hence, a 
special case arises for fc = 2: 

Corollary 8.23. Consider the core XPath algebra. Let D = (V, Ed, r, A) be a 
document, and let vi and V 2 be nodes of D. Then, vi =exp V 2 if and only if 

Vi =1 V2 ■ 

The proof of Proposition |8.8| can be recast to a proof of the following converse 
to Proposition |8.19| 

Proposition 8.24. Let k >2, and consider the core XPath algebra with count¬ 
ing up to fc. Let D = {V, Ed,r, X) be a document, and let vi, wi, V 2 , and W 2 be 
nodes of D. Assume furthermore that, for each expression e in the language, 
(vi,wi) G e{D) implies (v 2 ,W 2 ) G e(D). Then (wi,?«i) (v 2 ,W 2 ). 

Combining Propositions |8.19| and |8.24[ we obtain the following characteri¬ 
zation. 

Corollary 8.25. Let k > 2, and consider the core XPath algebra with counting 
up to fc. Let D = {V, Ed,r, X) be a document, and let vi, wi, V 2 , and W 2 be 
nodes of D. Then, 

1. the property that, for each expression e in the language under considera¬ 
tion, {vi,Wi) G e{D) implies (v 2 ,W 2 ) G e{D) is equivalent to the property 
ivi,wi) >^k {v 2 ,W 2 ); and. 


6.2 is also applicable to 


'^^Inverse (“ is redundant, by the identities in the proof of Proposition |2.3| comple¬ 
mented by 7ri(e)“'^(D) = 7ri(e)(D) and 7r2(e)“'^(D) = 7r2(e)(D). 
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2 . the property that, for each expression e in the language under consider¬ 
ation, (vi,wi) £ e(D) if and only if {v2,W2) G ^(.D) is equivalent to the 
property (vi,wi) ==fc (v2,W2). 


Lemma 8.10 also holds for the core XPath algebra (with the condition k > 3 
replaced by fc > 2 ). Lemma 8.11 is a different story, unfortunately. Example | 8.17 
already indicates that, given nodes Vi, Wi, V2, and W2 of a document D, we can 
in general not hope for an expression such that (v2,uj2) G if 


and only if (wi,wi) ==fc (^2,^2). 
subsumption does hold, however. 


The version with congruence replaced by 


Lemma 8.26. Let k > 2 . Let D — (V, Ed, r, A) be a document, and let Vi and Wi 
he a nodes of D. There exists an expression in the core XPath algebra with 

counting up to k such that, for all nodes V2 and W2 of D, {v2,W2) £ 
if and only if(vi,wi) >-k (v2,W2)- 


Proof. The proof follows the lines of the proof of Proposition 8.11 very closely, 
the main difference being that, from the proposed expression, the minus term 
must be omitted. □ 


The BP characterization results now follow readily. 

Theorem 8.27. Let k > 2 . Let D = (V, Ed,r, X) be a document, and let 

R C V X V. Then, there exists an expression e in the core XPath algebra with 

counting up to k such that e{D) = R if and only if, for all vi,Wi,V2,W2 G V 

with (vi,wi) >-k (v2,W2), (vijWi) G R implies (v2,W2) G R. 

~x 

The specialization to the core XPath algebra is as follows. 

Corollary 8.28. Let D = {V, Ed,r, X) be a document, and let R Q V x V. 
There exists an expression e in the core XPath algebra such that e{D) = R if 
and only if, for all vi,wi,V2,W2 £ V with (vi,wi) >-3 (v2,W2), (vi,wi) G R 
implies {v2,W2) G R. 

We recast Theorem | 8 . 27 | and Corollary | 8 . 28 | in terms of node-level navigation. 

Theorem 8.29. Let k > 2 . Let D = (V, Ed,r, X) be a document, let v be a 

node of D, and let W CV. Then there exists an expression e in the core XPath 

algebra with counting up to k such that e(D){v) = W if and only if, for all 

wi,W2 G W with (v,wi) >-k (v,W2), wi G W implies W2 G W. 

~x 

The specialization to the core XPath algebra is as follows. 

Corollary 8.30. Let D — (V, Ed, r, X) be a document, let v be a node of D, and 
let W C V. Then there exists an expression e in the core XPath algebra such 
that e{D){v) = W if and only if, for all wi,W2 £ W with (v,wi) >=2 {v,W2), 
wi G W implies W2 G W. 
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Finally, for the special case where navigation starts from the root, Theo- 
8.29| and Corollary 8.30| reduce to the following. 


Theorem 8.31. Let D = {V, Ed^ r, A). 

1. for each expression e in the core XPath algebra with counting up to k, 
k>2, there exists an expression e' in the strictly downward (core) XPath 
algebra with counting up to k such that e{D)(r) = e'{D){r). 

2. for each expression e in the core XPath algebra, there exists an expression 
e! in the strictly downward (core) XPath algebra with counting up to 2 
such that e{D){r) = e'{D){r). 


Together with Theorem |8.16[ Theorem |8.31| extends Theorem |6.15| When 
navigating from the root, the only thing that the core XPath algebra adds 
compared to using the strictly downward (core) XPath algebra is its ability to 
select on at least 2 children satisfying some condition. 


8.3. Languages without difference for two-way navigation 

As before with languages not containing difference, we do not consider count¬ 
ing operations, corresponding to considering the various syntactic notions of 
relatedness or equivalence between nodes only for the case k = 1. Taking into 
account Proposition |2.3[ and recognizing that the techniques used in this paper 
to establish characterizations heavily use intersection, this means that only the 
following two languages must be considered: 

• the language ^(4,, t,n), which we call the positive XPath algebra; and 

• the language fi tJ"!! 7r2, H), which we call the core positive XPath alge¬ 
bra. 


Some of the present authors showed in m that t, n) and X{1, f, 
are equivalent in expressive power (even at the level of queries). Since obviously 
X{f, t) ’’" 1 : '^ 2 ) = Cd, t, TTi, 712 ), it follows readily that the positive XPath algebra 
and the core positive XPath algebra are equivalent. 

The following results were already proved in |31j . and are only repeated for 
completeness’ sake. 


Theorem 8.32. Consider the (core) positive XPath algebra. Let D = {V, Ed, r, A) 
be a document, and let vi and V 2 be nodes of D. Then, 

1- vi >exp V 2 if and only if vi V 2 ; and 

2. vi =exp V 2 if and only if Vi V 2 . 


Theorem 8.33. Consider the (core) positive XPath algebra. Let D = {V, Ed,r, X) 
be a document, and let Vi, V 2 , Wi, and W 2 be nodes of D. Then, 


1 . 


the property that, for each (core) positive XPath expression e, {vi,wi) G 

e{D) implies (v 2 ,W 2 ) G e(D) is equivalent to (vi,wi) (v 2 ,W 2 ); and 

-t 
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2. the property that, for each expression e in the language under consider¬ 
ation, (vi,wi) £ e(D) if and only if {v 2 ,W 2 ) G ^(.D) is equivalent to the 
property {vi,wi) {v 2 ,W 2 )- 

As in Section [631 we can bootstrap these results to BP-type characteriza¬ 
tions. 

Theorem 8.34. Let D = {V, Ed,r,\) he a document, and let R C V x V. 

Then, there exists an expression e in the (core) positive XPath algebra such that 

e(D) = R if and only if, for all vi,wi,V 2 ,W 2 G V with {vi,wi) >> (v 2 ,W 2 ), 

-t 

(uijWi) € R implies (^ 2 ,^ 2 ) G R. 

Finally, Theorem |8.34| can be specialized to the node level, as follows. 

Corollary 8.35. Let D = (V, Ed,r, X) be a document, let v be a node of D, 
and let W CV. Then there exists an expression e in the (core) positive XPath 
algebra such that e{D){v) = W if and only if, for all nodes wi and W 2 of D with 
i'o,wi) {v,W 2 ), wiGW implies W 2 G W. 

Corollary 8.36. Let D = {V, Ed,r, X) be a document, and let W CV. Then 
there exists an expression e in the (core) positive XPath algebra such that 
e{D){r) = W if and only if, for all nodes wi and W 2 of D with wi W 2 , 

wi G W implies W 2 GW. 

Hence, the (core) positive XPath algebra, the weakly downward positive 
(core) XPath algebra, and the strictly downward positive (core) XPath algebra 
are all navigationally equivalent if navigation always starts from the root. 

9. Discussion 

In this paper, we characterized the expressive power of several natural frag¬ 
ments of XPath at the document level, as summarized in Table Of course, it 
is possible to consider other fragments or extensions of the XPath algebra and 
its data model. Analyzing these using our two-step methodology in order to fur¬ 
ther improve our understanding of the instance expressivity of Tarski’s algebra 
is one possible research direction which we have pursued recently [iniiiiiisi]. 

Another future research direction is refining the links between XPath and 
finite-variable first-order logics [32]. Indeed, such links have been established 
at the level of query semantics. For example, Marx [33] has shown that an ex¬ 
tended version of Core XPath is equivalent to FO^j-ee—first-order logic using at 
most two variables over ordered node-labeled trees—interpreted in the signature 
child, descendaut, and following_sibling. 

Our results establish new links to finite-variable first-order logics at the 
document level. For example, we can show that, on a given document, the 
XPath algebra and FO^—first-order logic with at most three variables—are 
equivalent in expressive power. Indeed, as we discussed above, at the document 
level, the XPath-algebra is equivalent with Tarski’s relation algebra |2| over 
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trees. Tarski and Givan t Hi established the link between Tarski’s algebra 


and FO^. Corollary 


1.7 


can then be used to give a new characterization, other 
[32l [34j, of when two nodes in an unordered tree are 
In this light, connections between other fragments of 


than via pebble-games | 
indistinguishable in FO^ 
the XPath algebra and finite-variable logics must be examined. 

The connection between the XPath algebra and FO^ also has ramifications 
with regard to complexity issues. Indeed, using a result of Grohe |35| which es¬ 
tablishes that expression equivalence for FO^ is decidable in polynomial time, it 
follows readily from Corollaries |8.13| and |8.15| that the global and local definabil¬ 
ity problems for the XPath algebra are decidable in polynomial time. Using the 
syntactic characterizations in this paper, one can also establish that the global 
and local definability problems for the other fragments of the XPath algebra are 
decidable in polynomial time. As mentioned in the Introduction, this feasibility 
suggests efficient partitioning and reduction techniques on the set of nodes and 
the set of paths in a document. Such techniques might be successfully applied 
towards various aspects of XML document processing such as indexing, access 
control, and document compression. This is another research direction which 
we are currently pursuing [12 Eg. 
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